Bone Age Assessment Using Content-Based Image Retrieval System Using VGG-۱۹ Deep Neural Network

سال انتشار: 1401
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 34

فایل این مقاله در 19 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_TEB-12-1_017

تاریخ نمایه سازی: 30 بهمن 1402

چکیده مقاله:

INTRODUCTION With the development of medical imaging devices, medical image production has increased significantly in recent years. Efficient management and retrieval of medical image datasets improve the prevention process and society's health. It is challenging to provide an accurate diagnosis while maintaining efficiency. Based on past studies, images that show a similar pathological condition help physicians and radiologists diagnose, record radiological reports, and plan treatment [۱]. Content-based image retrieval is a process in which similar images are identified and retrieved from an extensive database of images using search image content representation. Therefore, medical image retrieval systems have been considered in various fields of education, diagnosis, and care [۲] and many other fields. Accurate and fast retrieval of images from image databases is the main challenge of this field. This challenge is more important in recovering medical images due to the sensitivities associated with diagnosing and identifying abnormalities. Image retrieval approaches Image retrieval approaches are divided into three categories. In the first approach, text-based image retrieval, retrieval is performed based on the similarity of search and database image keywords. For this purpose, specialists in that field must add and save keywords that describe the image for all images in the database. For example, in reference [۳], a text-based image retrieval method is presented for mammography image retrieval. In this method, experts have added explanations in the form of keywords for all the images of the dataset. These descriptions include normal, patient, age, and other descriptive information about the image. The limitations of this method, including the addition of keywords manually or a detailed description of the patient's condition, have been considered from various aspects of the content-based image retrieval approach. In the second approach, content-based image retrieval, image search, and retrieval are done based on the similarity of the visual content of the search image and database images. In this method, extracting and producing an effective feature vector that accurately describes image features such as color, texture, and shape is very important [۴, ۵]. In the third approach, semantic-based image retrieval, the search is based on combining the previous two methods, including text and image. In this approach, by receiving semantic features from the search text and combining and mapping it with the visual features of the image, which is associated with the identification of objects and image segmentation techniques, it reduces the semantic gap and increases the efficiency of image retrieval. For example, in the approach presented in a study in ۲۰۲۳ [۶], the combination of clinical records in the form of text and related images of blood cells has been used to detect the progress of the disease in blood cells. In the content-based image retrieval approach, which is the subject of this research, the method of extracting features from images, based on the similarity measurement of the search image and database images, is divided into two categories: handcraft feature extraction and deep feature extraction. In the handcraft features extraction method, low-level features of the images such as color, texture, and shape are extracted and form a feature vector [۷, ۸]. HSV histogram, gray level co-occurrence matrix (GLCM), and wavelet-based methods are common methods in extracting color and texture features in the image. For example, in the study of Mall, Singh, and Yadav [۹], the texture features in musculoskeletal radiographic images are extracted from the gray-level co-occurrence matrix (GLCM). These features represent the second-order statistical information of the gray levels between neighboring pixels. Since only one feature cannot accurately describe the image, combining low-level features to generate the feature vector is more effective in image retrieval. In the study of Garg and Dhiman [۱۰], the combination of texture and color features has been used to extract features from medical images. A feature selector based on particle swarm optimization to reduce the dimensionality of the feature vector is also introduced. In modern content-based image retrieval approaches, low-level features are replaced by deep features using neural networks. Convolutional Neural Networks (CNNs) are usually pre-trained with very large datasets. For this reason, they provide the possibility of extracting features effectively. By applying different filters on the original image, convolution layers extract more complex and deeper features from the image [۱۱]. The following mentions the most effective methods of retrieving content-based medical images based on deep learning. Gordo et al. [۱۲] have used a pre-processing step to clean the dataset, pointing out the weakness of networks with deep architecture in noisy image retrieval. They have also used the Siamese network to learn the feature space. Shamna, Govindan, and Abdul Nazeer [۴] presented an unsupervised content-based medical image retrieval framework based on visual word spatial matching. In this method, the spatial similarity of visual words is calculated using a similarity measure called the skip similarity index. Chen et al. have introduced an image retrieval model based on deep hashing [۱۳]. This method uses multiscale information and hierarchical similarity to learn effective hash codes simultaneously. Karthik and Kamath have designed an approach for orientation labeling of body parts to reduce the variance in different medical scan images [۱۴]. The learned features are first used to predict the class labels and then to model the feature space to calculate the similarity in the retrieval process. Since integrating features in medical image retrieval is effective in retrieval results due to high-dimensional data and a huge amount of irrelevant information in images, in reference [۱۵], a hybrid method of fine-grained correlation analysis with the help of a deep neural network has been introduced. In this method, the image is first divided into local areas. Then, regions with similar features are labeled with the same label by the clustering algorithm. Finally, the correlation of fine-grained samples and combining different features have led to more distinct information with less redundancy in medical image retrieval. Kobayashi et al. also introduced a neural network architecture to analyze the semantic components of medical images with two labels [۱۶]: normal anatomy label and abnormal anatomy label. The normal anatomy label shows the normal anatomy that should be present if the sample is healthy. The introduced algorithm can retrieve images based on the semantic component selected from a Glioma brain magnetic resonance images dataset by calculating similarity based on normal or abnormal anatomical label or combining two labels. Bone age assessment In medicine, human growth is based on age, while biological age can be inferred from bone age. Different human body bones, such as fingers and wrists, contain growth areas called growth plates. These plates contain special cells involved in the bone's longitudinal growth. Growth plates are easily distinct and recognizable in an X-ray sample. Bone age is important in many fields, such as growth assessment and genetic disease screening [۱۷]. Bones undergo many changes in shape during life. The changes are especially greater during the growth period. The hand is connected to the lower arm through the radius and ulna bones. The bones of the hand consist of three parts: carpal, metacarpal, and phalange. These bones form a total of thirty bone parts. Figure ۱ shows the different parts of the palmar and wrist bones. The large number of bones in a relatively small spatial number and the relatively low radiation requirement make the hand an ideal organ for X-ray imaging. This causes these areas to appear darker on an X-ray image than the rest of the bone. Biometric features are generally not distinguishable enough to identify individuals but provide additional information about their identity [۱۸]. For this reason, the assessment of bone age has been considered both in medicine and forensic medicine. Manual bone age assessment method using (GP) or (TW۲) methods [۱۹] has been common. The GP method determines bone age by comparing the patient's radiograph with age ATLAS. The TW۲ technique also uses a scoring system that examines ۲۰ specific bones. In both cases, the bone evaluation method requires considerable time. Since clinical methods are subjective assessments, the accuracy of these methods depends on the physician's experience. Recently, solutions based on neural networks have been considered to evaluate bone age. Current methods based on convolutional neural networks usually rely on detecting key regions in hand bones to predict bone age [۲۰-۲۲]. In the study of Spampinat et al. [۲۰], three pre-trained convolutional neural networks are combined with the Bonet model using transfer learning. In the study of Liang et al. [۲۲], a region-based convolutional neural network was proposed to identify the centers of carpal bones and assess bone age. In this method, image features are extracted through the convolution layer. Also, areas of interest are automatically identified. To perform bone age assessment inspired by clinical approaches and aiming to reduce expensive manual annotations, localization of informative regions based on a complete unsupervised learning method, deep learning, and classical machine learning is used in reference [۲۳] to produce a reliable prediction. The pre-trained convolutional neural network automatically extracts image features in the deep learning-based method. For the classical machine learning method, moving edge detection methods have been implemented for feature extraction. Finally, the mean absolute error obtained by CNN is reported due to better results. In the method Cardoso et al. [۲۴] presented, the U-Net model is first used to obtain manually labeled key point regions. After that, a critical point detection model is used to align the hand radiographs in a common coordinate space. In the article by Escobar et al. [۲۵], critical points during training are suggested regarding how to place the hand position in the imaging of architecture with manually labeled bounding boxes and annotations. In order to estimate the condition and diagnose the area, it is suggested to use local information to evaluate the bone age. In this method, the MobileNetV۳ neural network is used to extract deep features and form the feature vector of each image. In this research, a bone age assessment method is introduced with the help of an image retrieval system. In this method, the features of each image are extracted using the fully connected layer of the VGG-۱۹ neural network. Compared to other pre-trained neural networks such as GoogleNe, DenseNet, and ResNet, the advantage of this network is its simple structure, the number of convolution layers is less, and the number of calculations is less. The number of convolution layers in GoogleNe, DenseNet, and ResNet networks is between ۲۲ and ۱۵۰ layers. Also, the number of learning parameters and computational complexity is much higher. In contrast, the VGG-۱۹ network has a stack-like structure, including ۱۶ convolutions and three fully connected layers. With fewer convolution layers, this network can effectively extract image features and patterns [۲۶]. Implementing this algorithm on devices with limited computing resources, like mobile phones, is also possible. Then, the image features are extracted using the VGG-۱۹ network. Different features are extracted from the image in this network by applying different filters. These features are suitable for describing image content. Since convolution is a linear operator, the extracted features are also linear. For this reason, the principal component analysis method is used to reduce the dimensions of the extracted feature vector. The principal component analysis algorithm is suitable for data with linear characteristics. This algorithm is an orthogonal linear transformation that transfers the data to a new coordinate system. This algorithm finds directions that maximize the variance of the data. Similarly, the feature vector of the search image is also extracted. Finally, the most similar images to the search image are retrieved by calculating the Euclidean distance. According to the evaluated samples, the bone age diagnosis of the search image is calculated (Figure ۱). MATERIALS & METHODS This observational study was conducted in ۲۰۲۳. The tested population was "Digital Hand ATLAS" [۲۷], with ۱۳۸۹ samples of hand images of people aged ۱ to ۱۸ years. This collection of images is categorized into four races: Asian, Black, Hispanic, and Caucasian. In addition to race, gender characteristics are also known. The actual age of each sample is also known in advance. Five samples were taken for each age group under ten; ten were taken over ten years old, ۴۴۰ samples for people under ten years old, and ۹۴۹ samples for people ۱۰ to ۱۸ years old. Figure ۲ shows the flowchart of the proposed image retrieval method in bone age assessment. First, each image I in the image dataset S= I i i=۱ n  is fed to the pre-trained network VGG-۱۹ [۲۸]. The feature vector  F I i = { f ۱ , f ۲ ,…, f n } of each image is extracted from FC۷ fully connected layer. The sets of feature vectors of dataset images form the initial feature space. All features extracted from images do not have the same role and importance in data separation. By reducing the dimensions of the feature vectors, in addition to maintaining the data structure and maintaining important features, the speed of comparison and search can also be improved. For this purpose, the principal component analysis algorithm is used. This algorithm is one of the feature mapping methods. In the feature mapping method, the nature of the features is changed [۲۹]. This algorithm can preserve the overall structure of the data and represent the data in feature space with lower dimensions. With the help of the dimension reduction method, unnecessary features that often cause inappropriate performance in pattern recognition and retrieval are also removed. After this step, a new feature vector F I i ' = { f ۱ , f ۲ ,…, f n }. For each image I is formed. In this way, a database of new feature vectors is formed for the database images. Similarly, the feature vector of the search image is also extracted F I q = { f ۱ , f ۲ ,…, f n }. In the similarity measurement stage, the feature vector of the search image and the feature vectors of the database are compared by calculating the Euclidean distance. Then images with the highest degree of similarity to the search image are retrieved. After retrieving similar images, the descriptions related to each image are decoded, and the bone age of each retrieved image is determined. Finally, the retrieved samples are calculated to estimate the bone age for the search image by averaging the bone age. The VGG-۱۹ deep neural network, whose architecture is shown in Figure ۳, comprises only convolution layers and an integration layer with a stack-like structure. This network consists of ۱۶ convolution layers and three fully connected layers. First, two convolutional layers with ۶۴ filters with the size of ۳*۳ filters and then there is a ۲×۲ maxpooling layer with a straide of ۲. This layer is effective in reducing the number of learnable network parameters by reducing the size of feature map. Next, two more convolutional layers with ۱۲۸  filters with the size of ۳*۳ filters and a ۲*۲ max pooling layer and step ۲ are placed. Similarly, three convolutional layers with ۲۵۶  filters with the size of ۳*۳ filters and one ۲*۲ max pooling layer with step ۲ are included. Two sets, including three convolutional layers with ۵۱۲ filters with the size of ۳*۳ filters and a max pooling layer, form the continuation of this network. Finally, the features enter the Fully Connected neural layer as a feature vector with dimensions of ۴۰۹۶. A neural layer with dimensions corresponding to the number of classes forms the last layer of this network. In the proposed method, only the feature vector of the full connection layer has been used as the feature vector. The activator function in all convolutional layers and neural layers is the ReLU function (Rectified Linear Unit). This function returns zero as output for negative data and exactly the data value for positive data. ReLU’s activator function is of interest in deep networks due to its simple mathematical calculations and high modeling speed. The following formula shows the rule of the ReLU function. A(x)=max (۰, x) Different image details are identified by applying different filtering in the convolution layers of the VGG-۱۹ neural network. The representation resulting from the visualization of applying filtering for an input image is shown in Figure ۴. The displayed images differ in brightness, image edges, and texture pattern recognition. This representation shows that the feature vector obtained from the VGG-۱۹ neural network includes image content features that are hierarchically extracted from the image at different levels. Weighted mean absolute error is an accepted criterion in the quantitative evaluation of bone age assessment results. Let ( x ۱ , x ۲ ,…, x N )   be data with dimensions of k features and  ( y ۱ , y ۲ ,…, y N )  represent the actual values of bone age for N samples. The predicted bone age f (xi)  is compared with the actual value yi . How to calculate this criterion is presented in the equation below. In this equation, wi  is the similarity weight of each retrieval sample. In this method, the age difference of the evaluation sample (reference sample) of the bone age of the five best-retrieved samples is calculated as the weighted mean of the absolute error. The lower this amount is and the closer to zero, it indicates that the retrieved samples are close and similar to the searched sample. wMAE= ۱ w i ۱ N w i y i -f( x i ) Ethical Permissions: The code of ethics IR has reviewed and approved this research.IAU.SRB.REC.۱۴۰۲.۱۳۹ in the Islamic Azad University, Science and Research Branch. The ethical principles of the present study were fully respected; maintaining confidentiality and not knowing the identity of people in this dataset is also observed. Statistical Analysis: The implementation of the proposed method of image retrieval in bone age assessment and analysis of the results was done in MATLAB ۲۰۲۲a software. FINDINGS The population evaluated in this study included ۱۳۸۹ hand X-ray image samples. Sample images of this dataset are shown in Figure ۵. The average number of samples for each age category was ۷۷ samples. These samples were selected from males and females and all four Asian, Black, Hispanic, and Caucasian races. For each test run, the top five retrieved samples and the age of the retrieved samples were compared with the reference sample. For each image, a feature vector equal to ۴۰۹۶ features was extracted. In the next step, by applying the dimension reduction algorithm of principal components analysis, ۲۶۰ features were considered for each image. The criteria for measuring the similarity of two images, calculating the Euclidean distance between the search image's feature vector and the database images' feature vectors, were considered. The bone age evaluation criterion was the absolute mean of error. The evaluation of the results of the proposed approach was presented in two parts. Quantitative results were examined and compared with other previous methods in the first part. The second part examined the qualitative evaluation of the retrieved images and their relationship with the searched image. The image retrieval quality in the proposed method was demonstrated by evaluating the retrieval samples for several search image samples in this section. This evaluation was done to check the quality and correlation of the retrieved samples in response to the search image. The average bone age of the top five retrieved samples was used to compare the bone age of the search image. The comparison of retrieval results for three search image samples from four different races is shown in Table ۲. The first example concerns image ۵۰۲۰ from the hand digital image ATLAS dataset. This image belonged to an ۱۸-year-old person in a group of Asian people. Since the retrieved specimens also belong to the same group, the bone age of the search image was also confirmed to be ۱۸ years. The second evaluation example in Table ۲ concerns image number ۳۲۴۵, belonging to a ۱۲-year-old black person from the digital hand image ATLAS dataset. Among the retrieved samples, two were related to the group of ۱۲-year-olds, and the other three were related to the group of ۱۳-year-old. The bone age of the search image was estimated to be ۱۲.۵۷ years. This way, the search image was calculated with a ۰.۶ bone age difference for this sample. The third sample in Table ۲ was image ۵۱۰۳, belonging to a ۱۶-year-old Hispanic person. Among the retrieved samples, three were related to ۱۶-year-old people, one was related to a ۱۷-year-old person, and one was related to an ۱۸-year-old person. This way, the bone age of the search image in this example was estimated to be ۱۶.۷۶ years. DISCUSSION The present study investigated the reliability of an automatic bone age assessment method through the image retrieval system. This study showed an error of less than four months in bone age assessment. The proposed method is comparable with the findings of studies in line with this research, including references [۲۵-۲۰]. The new Bonet network has been introduced in research by combining three neural networks and the transfer learning method to transfer the training domain [۲۰]. This paper's mean absolute error in bone age estimation is reported to be ۰.۷۹. In another study, from the pre-trained neural network and by adding more information sources such as gender, the retraining process of the network was performed [۲۱]. The mean error in this method is reported to be ۶۲%. In reference [۲۲], the bone age assessment model through the region-based convolutional neural network (R-CNN) is proposed. This diagnostic method focuses on bone age regression to identify the ossification centers of the epiphysis and carpal bones. In this method, large-scale X-ray images are considered the neural network input. The average absolute error in this method is reported to be ۰.۵۱. In the method presented by Cardoso et al., the MobileNet network is used to extract image features to evaluate bone age [۲۴]. The estimation error in this method is reported to be ۱.۴ years. This method pays attention to the hand's position in the image. The extracted features are limited to certain areas of the image. In this way, the extracted features are considered locally and not globally. This method's bone age assessment error is reported with a mean absolute error of ۰.۶۲. Many efforts have been made to increase the accuracy of bone age detection and estimation with the help of deep neural networks. Complex architecture, training time, and providing the number of training samples in the retraining process are among the problems of the mentioned methods. In the proposed method, in addition, to feature extraction, attention was paid to reducing the dimensions of the feature vector in order to reduce the time of comparison with the samples of the dataset. Reducing the dimensions of the feature vector by targeting ineffective features and reducing the comparison time was investigated. The best results showed the error rate with the weighted mean absolute error equal to ۰.۲۹ years and ۳.۴ months. Despite the appropriate performance of the proposed method in assessing and estimating bone age, they are increasing the accuracy of retrieving similar samples by combining local and national characteristics without creating redundancy in the approach of future studies of this research. Despite the performance of methods based on smart algorithms, it is important to note that bone age assessment should be combined with other research techniques. Also, qualified medical professionals should use this tool to increase accuracy and reliability. This study had limitations, such as more access to internal samples for more detailed evaluations. Although the bone density and growth pattern of the Iranian race is included in the category of Asian samples and has been investigated in the dataset evaluated by "Digital Hand ATLAS, "the need to investigate and localize methods based on smart algorithms for the implementation of smart systems in the field of medicine requires the provision of local data. CONCLUSION Based on this research, evaluating bone age with the help of image retrieval is an effective method for estimating bone age. Therefore, experts in this field can use this method to verify and detect the age of people without identity documents and other related matters. Clinical & Practical Tips in POLICE MEDICINE: One of the applications of bone age assessment in police investigations is to determine the age of unknown persons. In cases where a person's age cannot be determined by other means, such as identification or witness testimony, bone age assessment can estimate a person's age based on skeletal maturity. Another use of bone age assessment is in cases of suspected child abuse or neglect. In some cases, it can be challenging to determine the age of a child who has been abused or neglected, especially if the child has been denied proper nutrition or medical care. Using bone age assessment, researchers can estimate a child's age and determine whether the care provided has resulted in normal growth. In addition to the mentioned cases, bone age assessment is also used in cases of human trafficking or illegal immigration where the person's age is unclear or disputed. Using bone age assessment, the maturity or immaturity of a person can be recognized. Also, this information can be used to determine the appropriate legal and social services for the person or the limits of the crime according to the person's age. Conflict of interest: The authors of the article stated that there is no conflict of interest in the present study. Authors' Contribution: First author, presenting the idea; second author, presentation, data analysis; third author, data analysis; All the authors participated in the final writing of the article and its revision, and all of them accept the responsibility for the accuracy and correctness of the contents of the present article by finalizing the present article. Financial Sources: The current research lacked financial support from government and private authorities. Figure ۱) Human palm and wrist bones Figure ۲) Flowchart of the proposed method of image recovery system Figure ۳) VGG-۱۹ neural network architecture Figure ۴) Image visualization resulting from  applying the convolution operator on an input image Figure ۵) Examples of hand digital image atlas Table ۱) Comparison of bone age assessment with other methods Method Mean absolute error Method [۲۰]- ۲۰۱۷ ۰.۷۹ Method [۲۱]- ۲۰۲۱ ۰.۶۲ Method [۲۴]- ۲۰۱۹ ۴۷/۱ Method [۲۲]- ۲۰۱۹ ۰.۵۲ Method [۲۵]- ۲۰۲۰ ۰.۵۴ suggested method ۰.۲۹  Table ۲) Sample recovery results

کلیدواژه ها:

نویسندگان

بهنام درستکار یاقوتی

Department of information and communacation, Amin university Tehran. Iram

کامبیز رهبر

Department of Computer Engineering, South Tehran Branch, Islamic Azad University, Tehran ,Iran

فاطمه طاهری

Department of Computer Engineering, South Tehran Branch, Islamic Azad University, Tehran ,Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • Silva W, Gonçalves T, Härmä K, Schröder E, Obmann VC, ...
  • Pathak D, Raju USN. Content-based image retrieval using feature-fusion of ...
  • Takagi Y, Hashimoto N, Masuda H, Miyoshi H, Ohshima K, ...
  • https://doi.org/۱۰.۱۰۱۶/j.jpi.۲۰۲۲.۱۰۰۱۸۵Wang X, Du Y, Yang S, Zhang J, Wang M, ...
  • Taheri F, Rahbar K, Salimi P. Effective features in content-based ...
  • Karthik K, Kamath SS. A deep neural network model for ...
  • Kobayashi K, Hataya R, Kurose Y, Miyake M, Takahashi M, ...
  • Campbell D, William Garrett Jr by E, Speer KP. Assessment ...
  • Babaei M, Shirzad J, Keshavarz Meshkin Pham K, Faghih Fard ...
  • https://doi.org/۱۰.۱۲۹۷/cpe.۲۴.۱۴۳Spampinato C, Palazzo S, Giordano D, Aldinucci M, Leonardi R. ...
  • Wibisono A, Saputri MS, Mursanto P, Rachmad J, Alberto, Yudasubrata ...
  • https://doi.org/۱۰.۱۱۸۶/s۱۳۱۰۴-۰۲۰-۰۵۳۴۳-۴De Capitani di Vimercati S, Foresti S, Livraga G, Samarati ...
  • Pathak D, Raju USN. Content-based image retrieval using feature-fusion of ...
  • Takagi Y, Hashimoto N, Masuda H, Miyoshi H, Ohshima K, ...
  • https://doi.org/۱۰.۱۰۱۶/j.jpi.۲۰۲۲.۱۰۰۱۸۵Wang X, Du Y, Yang S, Zhang J, Wang M, ...
  • Taheri F, Rahbar K, Salimi P. Effective features in content-based ...
  • Karthik K, Kamath SS. A deep neural network model for ...
  • Kobayashi K, Hataya R, Kurose Y, Miyake M, Takahashi M, ...
  • Campbell D, William Garrett Jr by E, Speer KP. Assessment ...
  • Babaei M, Shirzad J, Keshavarz Meshkin Pham K, Faghih Fard ...
  • https://doi.org/۱۰.۱۲۹۷/cpe.۲۴.۱۴۳Spampinato C, Palazzo S, Giordano D, Aldinucci M, Leonardi R. ...
  • Wibisono A, Saputri MS, Mursanto P, Rachmad J, Alberto, Yudasubrata ...
  • https://doi.org/۱۰.۱۱۸۶/s۱۳۱۰۴-۰۲۰-۰۵۳۴۳-۴De Capitani di Vimercati S, Foresti S, Livraga G, Samarati ...
  • نمایش کامل مراجع