This study aims to investigate how the fluctuation of time intervals between self-assessment test sets influence the performance of radiologists and radiology trainees. The data was collected from 54 radiologists and 92 trainees who completed 260 and 550 readings of 9 mammogram test sets between 2019 and 2023. Readers’ performances were evaluated via case sensitivity, lesion sensitivity, specificity, ROC AUC and JAFROC. There was significant positive correlation between the intervals of test sets and radiologist's improvement in specificity and JAFROC (P<0.05). For separations in test sets exceeding 90 days, radiologists’ performance improved for sensitivity (5.2%), lesion sensitivity (6.6%), ROC (3.1%) and JAFROC (6.3%), with specificity remaining consistent. For trainees who completed test sets within a single day, a significant postive correlation was recorded between the time intervals of test sets and their improvement in ROC AUC (P=0.008) and JAFROC (P=0.02). However, for trainees who needed more than 1 day to complete a test set, this correlation was reversed in sensitivity (P=0.009) and ROC AUC (P=0.02). The most notable progress of trainees was found in sensitivity (6.15%), lesion sensitivity (11.6%), ROC AUC (3.5%) and JAFROC (4.35%) with specificity remained unchanged when the test sets were completed between 31-90 days.
This paper investigates whether two publicly available Artificial Intelligence (AI) models can detect retrospectively identified missed cancers within a double reader breast screening program and determine whether challenging mammographic cases are reflected in the performance of AI models. Transfer learning was conducted on the Globally-aware Multiple Instance Classifier (GMIC) and Global-Local Activation Maps (GLAM) models using an Australian mammographic dataset. Mammograms were enhanced to improve poor contrast using the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. The sensitivity of the two AI models with pre-trained and transfer learning modes was evaluated on four mammographic case groups: ‘missed’ cancers, ‘prior-visible’ cancers, ‘prior-invisible’ cancers and ‘current’ cancers from the archives of a double reader breast screening program. The GMIC model outperformed the GLAM model with pre-trained and transfer learning modes in terms of sensitivity for all four cancer groups. The performance of the GMIC and GLAM models was best in ‘prior-visible’ cancers, followed by ‘prior-invisible’ cancers, ‘current’ cancers and ‘missed’ cancers. The performance of the GMIC and GLAM models on the ‘missed’ cancer cases was 84.2% and 81.5%, respectively while for the ‘prior-visible’ cancer cases, the performance was 92.7% and 89.2%, respectively. After transfer learning, both the GMIC and GLAM models demonstrated statistically significant improvement (>9.4%) in terms of sensitivity for all cancer groups. The AI models with transfer learning showed significant improvement in malignancy detection in challenging mammographic cases. The study also supports the potential of the AI models to identify missed cancers within a double reader breast screening program.
KEYWORDS: Digital breast tomosynthesis, Cancer, Cancer detection, Breast cancer, Education and training, Mammography, Breast, Diagnostics, Architectural distortion, Breast density
Introduction: Breast cancer is the most common cancer among women in China and early detection is key to reducing mortality. This study aimed to understand diagnostic performances of Chinese radiologists between FFDM (full-field digital mammography) and DBT (digital breast tomosynthesis) images in terms of lesion features and reader characteristics.
Methods: 32 Chinese radiologists read two mammogram test sets to identify cancer cases and to detect lesions. The first set was of FFDM images (60 cases, 21 cancers) and the second was of DBT images (35 cases, 15 cancers). The accuracy in cancer case detection and lesion detection of radiologists in each test set were analysed. Comparison of diagnostic performances of radiologists with different working experiences were also undertaken. Results were compared using the Wilcoxon Sign Rank and Mann-Whitney U tests.
Results: Chinese radiologists recorded higher diagnostic accuracy with FFDM than DBT for detecting certain lesion types (calcifications, architectural distortion, mixed types) and lesions ≤ 10 mm. There was no significant difference in the accuracy for cancer case detection between FFDM and DBT. Radiologists who had more than eight years working experience, read more than 60 cases per week or had no DBT training had significantly higher lesion accuracy with FFDM than DBT.
Conclusion: Chinese radiologists had higher lesion accuracy with FFDM in certain lesion types and sizes than DBT. This may be related to the lack of appropriate DBT training for radiologists in China.
KEYWORDS: Mammography, Current controlled current source, Artificial intelligence, Education and training, Breast density, Cancer, Cancer detection, Breast cancer
This preliminary study investigates the magnitude of concordance, affecting factors and restrictions when radiologists' make annotations on mammographic images. Annotated data is key to the development of artificial intelligence (AI) tools and errors from annotations can reduce the accuracy of these tool. Two highly experienced radiologists (>20 years’ experience) provided annotations as rectangular regions of interest to mark the location of lesions when they read 856 mammographic images with known cancer signs. Mammographic images were resized to same resolution of 1664 × 768 pixels using bilinear interpolation. We calculated Lin’s concordance correlation coefficient (CCC) between the coordinates in x-axis and y-axis of the 4 corners of the overlapped annotations. The two overlapped annotations in different views (cranio-caudal (CC) and medio-lateral oblique (MLO)) were evaluated for agreement between radiologists. The values of Lin’s CCC were classified in four interpretation levels: the ‘almost perfect’, ‘substantial’, ‘moderate’ and ‘poor’ according to McBride's guide (2015). The results demonstrated ‘almost perfect’, ‘substantial’, ‘moderate’ and ‘poor’ concordance in 50.1%, 29.8%, 9.5% and 10.6% of the total overlapped annotations in the MLO view, with 93.1%, 5.6%, 0.3% and 1.0% of the total overlapped annotations in the CC view, respectively. Overall, the radiologists demonstrated stronger concordance when annotating the CC view compared to the MLO. Breast density (BD) also affected the concordance of the radiologists’ annotations with a decrease in the strength of concordance agreement between breast density classifications, from 0-50% BD = higher concordance to 50-100% BD = lower concordance. Our annotation investigation has implications for AI, where delineation of lesions is often the starting point for training data.
KEYWORDS: Digital breast tomosynthesis, Breast density, Mammography, Breast, Cancer, Education and training, Diagnostics, Breast cancer, Cancer detection, Radiology
PurposeThis study aims to investigate the diagnostic performances of Australian and Shanghai-based Chinese radiologists in reading full-field digital mammogram (FFDM) and digital breast tomosynthesis (DBT) with different levels of breast density.ApproachEighty-two Australian radiologists interpreted a 60-case FFDM set, and 29 radiologists also reported a 35-case DBT set. Sixty Shanghai radiologists read the same FFDM set, and 32 radiologists read the DBT set. The diagnostic performances of Australian and Shanghai radiologists were assessed using truth data (cancer cases were biopsy proven) and compared overall in specificity, case sensitivity, lesion sensitivity, receiver operating characteristics (ROC) area under the curve, and jack-knife free-response receiver operating characteristics (JAFROC) figure of merit, and they were stratified by case characteristics using the Mann–Whitney U test. The Spearman rank test was used to explore the association between radiologists’ performances and their work experience in mammogram interpretation.ResultsThere were significantly higher performances of Australian radiologists compared with Shanghai radiologists in low breast density for case sensitivity, lesion sensitivity, ROC, and JAFROC in the FFDM set (P < 0.0001); in high breast density, Shanghai radiologists’ performances in lesion sensitivity and JAFROC were also lower than Australian radiologists (P < 0.0001). In the DBT test set, Australian radiologists performed better than Shanghai radiologists in cancer detection in both low and high breast density. The work experience of Australian radiologists was positively linked to their diagnostic performances, whereas this association was not statistically significant in Shanghai radiologists.ConclusionThere were significant variations in reading performances between Australian and Shanghai radiologists in FFDM and DBT across different levels of breast density, lesion types, and lesion sizes. An effective training initiative tailored to suit local readers is essential to enhancing the diagnostic accuracy of Shanghai radiologists.
Objectives: To study the effect on radiology trainees’ observer performance through the availability of prior screening mammograms as part of seven unique education test sets. Methods: Australian radiology trainees (n=150) completed 469 readings of seven educational test sets (each set with 60 cases, 40 normal and 20 cancer cases). The percentage of cases with a prior screening mammogram was 68.7%. Mammographic density (MD) evaluated via BIRADS was spread across the test sets, with 40.5% having 25-50% glandular tissue (BIRADS “B”), 37.4% of cases having 50-75% or “C”, 12.6% have a >75% MD and 9.5% having the lowest MD rating “A”. Trainees were asked to score the cases on a scale of 1 (normal), 2 (benign), 3 (equivocal findings), 4 (suspicious finding) and 5 (highly suggestive malignancy). Mann-Whitney U was used to compare the specificity and sensitivity of radiology trainees among cases with and without prior images. Results: Radiology trainees had significantly higher sensitivity across all MD levels when prior images were not available (A-B, P=0.006; C-D, P=0.027). Specificity was also significantly higher for cases of high (C-D) MD without prior images compared with priors available by trainees who read less than 20 cases per week (P=0.008). Conclusions: In a simulated environment, radiology trainees achieved better results in cases without prior images, especially for those who read less than 20 cases per week. The utility of prior case inclusion when providing education and training in reading screening mammograms needs to be revisited, especially for women with high MD.
Current literature has described the usefulness of the DBT in addition to FFDM because of the increase in cancer detection and decrease in recall rates. The primary limitations of using FFDM plus DBT for screening are the rise in radiation dose, which approximately doubles if both modalities are used. Subsequently, synthesized two-dimensional views can be reconstructed from DBT slices with the ideal to replace FFDM. Although many studies have explored the value of DBT in addition to FFDM, little attention is given to the effectiveness that synthesized views might bring to the radiologists as a supplement view for DBT. The aim of this study is to investigate the diagnostic accuracy of radiology trainees with DBT only compared with DBT plus the synthesized view (C-View). Twenty radiology trainees were asked to report a set of 35 two-projection DBT images of left and right breasts (15 were cancer cases). Another group of 8 trainees read the same DBT set with the addition of the C-View. Participants searched for the presence of lesions within the cases using the Tabar RANZCR system where 2 represented a benign lesion; 3-5 represented the suspicion of a malignancy with a higher value indicating a higher malignant possibility. The readers’ performances were evaluated via specificity, sensitivity, lesion sensitivity, ROC and JAFROC between two reading modes. The results demonstrated diagnostic metrics of participants were not significantly different in reading DBT only compared with the group reading DBT plus synthesized view (P<0.05). This finding implies that viewing DBT only could be equivalent to DBT plus C-View for radiology trainees.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.