Silicosis is a type of occupational lung disease or pneumoconiosis that results from the inhalation of crystalline silica dust that can lead to fatal respiratory conditions. This study aims to develop an online platform and benchmark radiologists' performance in diagnosing silicosis. Fifty readers (33 radiologists and 17 radiology trainees) interpreted a test-set of 15 HRCT cases. The median AUROC for all readers combined was 0.92 (0.93 for radiologists and 0.91 for trainees). No statistical differences were observed among the radiologists and trainees for their performance. Moderate agreement was recorded among readers for the correct diagnosis of silicosis (κ=0.57), however, there was considerable variability (κ<0.2) in the accurate detection of irregular opacities and ground glass opacities. Our online platform shows promise in providing tailored education to clinicians and facilitating future works of long-term observer studies and development of educational solutions to enhance the diagnostic accuracy of silicosis detection.
KEYWORDS: Digital breast tomosynthesis, Cancer, Cancer detection, Breast cancer, Education and training, Mammography, Breast, Diagnostics, Architectural distortion, Breast density
Introduction: Breast cancer is the most common cancer among women in China and early detection is key to reducing mortality. This study aimed to understand diagnostic performances of Chinese radiologists between FFDM (full-field digital mammography) and DBT (digital breast tomosynthesis) images in terms of lesion features and reader characteristics.
Methods: 32 Chinese radiologists read two mammogram test sets to identify cancer cases and to detect lesions. The first set was of FFDM images (60 cases, 21 cancers) and the second was of DBT images (35 cases, 15 cancers). The accuracy in cancer case detection and lesion detection of radiologists in each test set were analysed. Comparison of diagnostic performances of radiologists with different working experiences were also undertaken. Results were compared using the Wilcoxon Sign Rank and Mann-Whitney U tests.
Results: Chinese radiologists recorded higher diagnostic accuracy with FFDM than DBT for detecting certain lesion types (calcifications, architectural distortion, mixed types) and lesions ≤ 10 mm. There was no significant difference in the accuracy for cancer case detection between FFDM and DBT. Radiologists who had more than eight years working experience, read more than 60 cases per week or had no DBT training had significantly higher lesion accuracy with FFDM than DBT.
Conclusion: Chinese radiologists had higher lesion accuracy with FFDM in certain lesion types and sizes than DBT. This may be related to the lack of appropriate DBT training for radiologists in China.
KEYWORDS: Digital breast tomosynthesis, Breast density, Mammography, Breast, Cancer, Education and training, Diagnostics, Breast cancer, Cancer detection, Radiology
PurposeThis study aims to investigate the diagnostic performances of Australian and Shanghai-based Chinese radiologists in reading full-field digital mammogram (FFDM) and digital breast tomosynthesis (DBT) with different levels of breast density.ApproachEighty-two Australian radiologists interpreted a 60-case FFDM set, and 29 radiologists also reported a 35-case DBT set. Sixty Shanghai radiologists read the same FFDM set, and 32 radiologists read the DBT set. The diagnostic performances of Australian and Shanghai radiologists were assessed using truth data (cancer cases were biopsy proven) and compared overall in specificity, case sensitivity, lesion sensitivity, receiver operating characteristics (ROC) area under the curve, and jack-knife free-response receiver operating characteristics (JAFROC) figure of merit, and they were stratified by case characteristics using the Mann–Whitney U test. The Spearman rank test was used to explore the association between radiologists’ performances and their work experience in mammogram interpretation.ResultsThere were significantly higher performances of Australian radiologists compared with Shanghai radiologists in low breast density for case sensitivity, lesion sensitivity, ROC, and JAFROC in the FFDM set (P < 0.0001); in high breast density, Shanghai radiologists’ performances in lesion sensitivity and JAFROC were also lower than Australian radiologists (P < 0.0001). In the DBT test set, Australian radiologists performed better than Shanghai radiologists in cancer detection in both low and high breast density. The work experience of Australian radiologists was positively linked to their diagnostic performances, whereas this association was not statistically significant in Shanghai radiologists.ConclusionThere were significant variations in reading performances between Australian and Shanghai radiologists in FFDM and DBT across different levels of breast density, lesion types, and lesion sizes. An effective training initiative tailored to suit local readers is essential to enhancing the diagnostic accuracy of Shanghai radiologists.
This study aimed to investigate the effect on reading performance of how long radiologists have been awake (“time awake”) and the number of hours they slept at night (“hours slept at night”) before a reading session. Data from 133 mammographic reading assessments were extracted from the Breast Screen Reader Assessment Strategy database. Analysis of covariance was performed to determine whether sensitivity, specificity, lesion sensitivity, ROC, and JAFROC were influenced by the time awake and the hours slept at night. The results showed that less experienced radiologists’ performance varied significantly according to the time awake: lesion sensitivity was significantly lower among radiologists who performed readings after being awake for less than 2 h (44.6%) than among those who had been awake for 8 to <10 h (71.03%; p = 0.013); likewise, the same metric was significantly lower among those who had been awake for 4 to <6 h (47.7%) than among those who had been awake for 8 to <10 h (71.03%; p = 0.002) and for 10 to <12 h (63.46%; p = 0.026). The ROC values of the less experienced radiologists also seemed to depend on the hours slept at night: values for those who had slept ≤6 h (0.72) were significantly lower than for those who had slept >6 h (0.77) (p = 0.029). The results indicate that inexperienced radiologists’ performance may be affected by the time awake and hours slept the night before a reading session.
This study explored whether having a better performance in usual presentation condition, more years of experience, and higher volume of annual mammogram assessment make a radiologist better at perceiving the gist of the abnormal on a mammogram. Nineteen radiologists were recruited for two experiments. In the first one (gist experiment), the initial impressions of the radiologists were collected based on a half-second image presentation on a scale of 0 (confident normal) and 100 (confident abnormal). In the second one, radiologists viewed similar set of cases using BreastScreen Reader Assessment Strategy platform and rated each case on a scale of 1-5. Using Spearman correlation, we explored if the area under receiver operating characteristics curve (AUC) in two experiments were correlated. Radiologists were also grouped based on variables describing their experience levels and workload and their performance in both experiments were compared among the groups. The AUC values in the gist experiment was not significantly correlated to the AUC values in the normal reporting experiment (Spearman correlation=0.183, p-value=0.453). Radiologists’ performances under the normal reporting conditions, was linked to the number of cases per week (p=0.044), number of hours per week currently spent reading mammograms(p=0.028), and number of years they have been reading mammograms (p=0.041). However, none of the variables reached a p-value<0.05 for the AUC of the gist experiment. The results suggest that further studies should be done to establish relationships between the gist response and radiologists’ characteristics since being a high-performing radiologist, highly experienced radiologist, or reading high volume of mammograms does not indicate superior capability when perceiving the gist of the abnormal.
This study explored the possibility of using the gist signal (radiologists’ first impression about a case) for improving the performance of two recently developed deep learning-based breast cancer detection tools. We investigated whether by combining the cancer class probability from the networks with the gist signal, higher performance in identifying malignant cases can be achieved. In total, we recruited 53 radiologists, who provided an abnormality score on a scale from 0 to 100 to unilateral mammograms following a 500-millisecond presentation of the image. Twenty cancer cases, 40 benign cases, and 20 normal were included. Two state-ofthe-art deep learning-based tools (M1 and M2) for breast cancer detection were adopted. The abnormality scores from the networks and the gist responses for each observer were fed into a support vector machine (SVM). The SVM was personalized for each radiologist and its performance was evaluated using leave-one-out cross-validation. We also considered the average reader; whose gist responses were the mean abnormality scores given by all 53 readers to each image. The mean and range of AUCs in the gist experiment were 0.643 and 0.492-0.794, respectively. The AUC values for M1 and M2 were 0.789 (0.632-0.892) and 0.814 (0.673-0.897), respectively. For the average reader, the AUC for gist, gist+M1, and gist+M2 were 0.760 (0.617-0.862), 0.847 (0.754-0.928), 0.897 (0.789-0.946). For 45 readers, the performance of at least one of the models improved after aggregating its output with the gist signal. The results showed that the gist signal has the potential to improve the performance of adopted deep learning-based tools.
Numerous factors contribute to radiologist image reading discrepancy and interpretive errors. However, a factor often overlooked is how interpretations might be impacted by the time of day when the image reading takes place—a factor that other disciplines have shown to be a determinant of competency. This study therefore seeks to investigate whether radiologists’ reading performances vary according to the time of the day at which the readings take place. We evaluated 197 mammographic reading assessments collected from the BreastScreen Reader Assessment Strategy (BREAST) database, which included reading timestamps and radiologists’ demographic data, and conducted an analysis of covariance to determine whether time of day influenced the radiologists’ specificity, lesion sensitivity, and jackknife alternative free-response receiver operating characteristic (JAFROC). After adjusting for radiologist experience and fellowship, we found a significant effect of the time of day of the readings on specificity but none on lesion sensitivity or JAFROC. Radiologist specificity was significantly lower in the late morning (10 am–12 pm) and late afternoon (4 pm–6 pm) than in the early morning (8 am–10 am) or early afternoon (2 pm–4 pm), indicating a higher rate of false-positive interpretations in the late morning and late afternoon. Thus, the time of day mammographic image readings take place may influence radiologists’ performances, specifically their ability to identify normal images correctly. These findings present significant implications for radiologic clinicians.
This study measured the correlation between the magnitude of the presence of the abnormality gist and case difficulty based on standard presentation and reporting mechanisms for 80 cases. Half of the cases contained biopsy-proven cancer while the remainder were normal and confirmed to be cancer-free for at least two years of follow-up. In the gist experiment, seventeen breast radiologists and physicians gave an abnormality score on a scale from 0 (confident normal) to 100 (confident abnormal) to unilateral CC mammograms following a very brief, 500 millisecond presentation of the image. Independently, each mammogram was assessed by a separate sample of at least 40 radiologists using standard presentation and reporting mechanisms, with these readers asked to locate any cancers present. All readers reported at least 1000 cases annually. For each case and each category, the percentage of correct reports served as an objective measure of case difficulty (lower rate of correct report shows a more difficult case). For each of the 17 readers, the association between the abnormality scores from the gist study and detection rates from the earlier reports was examined using Spearman correlation. None of the coefficients were significantly different from zero (p<0.05). For the normal cases, the correlation coefficient between abnormality scores and detection rates for the 17 readers ranged from -0.262 to 0.258, and for cancer -0.180 to 0.309. The results suggest that the gist signal may indicate the presence of cancer, using mechanisms other than those employed in usual reporting, and might be exploited to improve breast cancer detection.
Can radiologists distinguish prior mammograms with no overt signs of cancer from women who were later diagnosed with breast cancer from the prior mammograms of women reported as normal and subsequently confirmed to be cancerfree? Twenty-three radiologists and breast physicians viewed 200 craniocaudial mammograms for a half-second and rated whether the woman would be recalled on a scale of 0 (clearly normal) to 100 (clearly abnormal). The dataset included five categories of mammograms, with each category containing 40 cases. The categories were Cancer (current cancer-containing mammograms), Prior-Vis (prior mammograms with visible cancer signs), Contra (current ‘normal’ mammograms contralateral to the cancer), Prior-Invis (priors without visible cancer signs), and Normal (priors of normal cases). For each radiologist, four pairs of analyses were performed to evaluate whether the radiologists could distinguish mammograms in each category from the normal mammograms: Cancer vs Normal, Prior-Vis vs Normal, Contra vs Normal, and Prior-Invis vs Normal. The Area under Receiver Operating Characteristic curves (AUC) was calculated for each paired grouping and each radiologist. Wilcoxon Signed Rank test showed the AUC values were above-chance for all comparisons: Cancer (z=4.20, P<0.001); Prior-Vis (z=4.11, P<0.001); Contra (z=4.17, P<0.001); Prior-Invis (z=3.71, P<0.001). The results suggest that radiologists can distinguish patients who were diagnosed with cancer from individuals without breast cancer at an above-chance level based on a half-second glimpse of mammogram even before the lesion becomes apparently visible (Prior-Invis). Apparently, something about the breast parenchyma can look abnormal before the appearance of a localized lesion
Purpose: To determine the impact of Breast Screen Reader Assessment Strategy (BREAST) over time in improving radiologists’ breast cancer detection performance, and to identify the group of radiologists that benefit the most by using BREAST as a training tool. Materials and Methods: Thirty-six radiologists who completed three case-sets offered by BREAST were included in this study. The case-sets were arranged in radiologists’ chronological order of completion and five performance measures (sensitivity, specificity, location sensitivity, receiver operating characteristics area under the curve (ROC AUC) and jackknife alternative free-response receiver operating characteristic (JAFROC) figure–of-merit (FOM)), available from BREAST, were compared between case-sets to determine the level of improvement achieved. The radiologists were then grouped based on their characteristics and the above performance measures between the case-sets were compared. Paired t-tests or Wilcoxon signed-rank tests with statistical significance set at p < 0.05 were used to compare the performance measures. Results: Significant improvement was demonstrated in radiologists’ case-set performance in terms of location sensitivity and JAFROC FOM over the years, and radiologists’ location sensitivity and JAFROC FOM showed significant improvement irrespective of their characteristics. In terms of ROC AUC, significant improvement was shown for radiologists who were reading screen mammograms for more than 7 years and spent more than 9 hours per week reading mammograms. Conclusion: Engaging with case-sets appears to enhance radiologists’ performance suggesting the important value of initiatives such as BREAST. However, such performance enhancement was not shown for everyone, highlighting the need to tailor the BREAST platform to benefit all radiologists.
KEYWORDS: Breast, Mammography, Breast cancer, Teleradiology, Breast cancer, Diagnostics, Cancer, Medical imaging, Radiology, Digital imaging, Image compression
Aim: To compare the performance of Australian and Singapore breast readers interpreting a single test-set that consisted of mammographic examinations collected from the Australian population. Background: In the teleradiology era, breast readers are interpreting mammographic examinations from different populations. The question arises whether two groups of readers with similar training backgrounds, demonstrate the same level of performance when presented with a population familiar only to one of the groups. Methods: Fifty-three Australian and 15 Singaporean breast radiologists participated in this study. All radiologists were trained in mammogram interpretation and had a median of 9 and 15 years of experience in reading mammograms respectively. Each reader interpreted the same BREAST test-set consisting of sixty de-identified mammographic examinations arising from an Australian population. Performance parameters including JAFROC, ROC, case sensitivity as well as specificity were compared between Australian and Singaporean readers using a Mann Whitney U test. Results: A significant difference (P=0.036) was demonstrated between the JAFROC scores of the Australian and Singaporean breast radiologists. No other significant differences were observed. Conclusion: JAFROC scores for Australian radiologists were higher than those obtained by the Singaporean counterparts. Whilst it is tempting to suggest this is down to reader expertise, this may be a simplistic explanation considering the very similar training and audit backgrounds of the two populations of radiologists. The influence of reading images that are different from those that radiologists normally encounter cannot be ruled out and requires further investigation, particularly in the light of increasing international outsourcing of radiologic reporting.
KEYWORDS: Mammography, Breast, Cancer, Breast cancer, Diagnostics, Digital mammography, Medical imaging, Digital imaging, Data analysis, Health sciences
This study aims to investigate the effectiveness of the single cranio-caudal (CC) mammogram in comparison with traditional two projection mammography for breast cancer detection. Sixteen radiologists were invited to report 60 two-projection (MLO and CC) mammograms of the left and right breasts of which 20 cases contained cancer. Participants searched for the presence of breast lesion(s) on each view and provided a confidence score. Sensitivity, lesion sensitivity and specificity were compared between the CC projection versus the two projection approach among different groups of readers. Results showed that expert readers needed only single CC mammogram in their reading while non-expert readers required two-projection mammography.
KEYWORDS: Mammography, Breast, Image analysis, Health sciences, Biopsy, Breast imaging, Cancer, Medical imaging, Breast cancer, Fine needle aspiration, Statistical analysis
Rationale and Objectives: This study will investigate the link between radiologists’ experience in reporting mammograms, their caseloads and the decision to give a classification of Royal Australian and New Zealand College of Radiologists (RANZCR) category ‘3’ (indeterminate or equivocal finding). Methods: A test set of 60 mammograms comprising of 20 abnormal and 40 normal cases were shown to 92 radiologists. Each radiologist was asked to identify and localize abnormalities and provide a RANZCR assessment category. Details were obtained from each reader regarding their experience, qualifications and breast reading activities. ‘Equivocal fractions’ were calculated by dividing the number of ‘equivocal findings’ given by each radiologist in the abnormal and normal cases by the total number of cases analyzed: 20 and 40 respectively. The ‘equivocal fractions’ for each of the groups (normal vs abnormal) were calculated and independently correlated with age, number of years since qualification as a radiologist, number of years reading mammograms, number of mammograms read per year, number of hours reading mammograms per week and number of mammograms read over lifetime (the number of years reading mammograms multiplied by the number of mammograms read per year). The non-parametric Spearman test was used. Results: Statistically negative correlations were noted between ‘equivocal fractions’ for the following groups: • For abnormal cases: hours per week (r= -0.38 P= 0.0001) • For normal cases: total number of mammograms read per year (r= -0.29, P= 0.006); number of mammograms read over lifetime (r= -0.21, P= 0.049)); hours reading mammograms per week (r= - 0.20, P= 0.05). Conclusion: Radiologists with greater reading experience assign fewer RANZCR category 3 or equivocal classifications. The findings have implications for screening program efficacy and recall rates. This work is still in progress and further data will be presented at the conference.
Rationale and Objectives: To identify parameters linked to higher levels of performance in screening mammography. In particular we explored whether experience in reading digital cases enhances radiologists’ performance.
Methods: A total of 60 cases were presented to the readers, of which 20 contained cancers and 40 showed no abnormality. Each case comprised of four images and 129 breast readers participated in the study. Each reader was asked to identify and locate any malignancies using a 1-5 confidence scale. All images were displayed using 5MP monitors, supported by radiology workstations with full image manipulation capabilities. A jack-knife free-response receiver operating characteristic, figure of merit (JAFROC, FOM) methodology was employed to assess reader performance. Details were obtained from each reader regarding their experience, qualifications and breast reading activities. Spearman and Mann Whitney U techniques were used for statistical analysis.
Results: Higher performance was positively related to numbers of years professionally qualified (r= 0.18; P<0.05), number of years reading breast images (r= 0.24; P<0.01), number of mammography images read per year (r= 0.28; P<0.001) and number of hours reading mammographic images per week (r= 0.19; P<0.04). Unexpectedly, higher performance was inversely linked to previous experience with digital images (r= - 0.17; p<0.05) and further analysis, demonstrated that this finding was due to changes in specificity.
Conclusion: This study suggests suggestion that readers with experience in digital images reporting may exhibit a reduced ability to correctly identify normal appearances requires further investigation. Higher performance is linked to number of cases read per year.
Aim: To examine the relationship between sensitivity measured from the BREAST test-set and clinical performance.
Background: Although the UK and Australia national breast screening programs have regarded PERFORMS and BREAST test-set strategies as possible methods of estimating readers' clinical efficacy, the relationship between test-set and real life performance results has never been satisfactorily understood.
Methods: Forty-one radiologists from BreastScreen New South Wales participated in this study. Each reader interpreted a BREAST test-set which comprised sixty de-identified mammographic examinations sourced from the BreastScreen Digital Imaging Library. Spearman's rank correlation coefficient was used to compare the sensitivity measured from the BREAST test-set with screen readers' clinical audit data.
Results: Results shown statistically significant positive moderate correlations between test-set sensitivity and each of the following metrics: rate of invasive cancer per 10 000 reads (r=0.495; p < 0.01); rate of small invasive cancer per 10 000 reads (r=0.546; p < 0.001); detection rate of all invasive cancers and DCIS per 10 000 reads (r=0.444; p < 0.01).
Conclusion: Comparison between sensitivity measured from the BREAST test-set and real life detection rate demonstrated statistically significant positive moderate correlations which validated that such test-set strategies can reflect readers' clinical performance and be used as a quality assurance tool. The strength of correlation demonstrated in this study was higher than previously found by others.
KEYWORDS: Breast, Diagnostics, Mammography, Breast imaging, Breast cancer, Image quality, Medical imaging, Digital imaging, Imaging systems, Image analysis
High quality breast imaging and accurate image assessment are critical to the early diagnoses, treatment and management of women with breast cancer. Breast Screen Reader Assessment Strategy (BREAST) provides a platform, accessible by researchers and clinicians world-wide, which will contain image data bases, algorithms to assess reader performance and on-line systems for image evaluation. The platform will contribute to the diagnostic efficacy of breast imaging in Australia and beyond on two fronts: reducing errors in mammography, and transforming our assessment of novel technologies and techniques. Mammography is the primary diagnostic tool for detecting breast cancer with over 800,000 women X-rayed each year in Australia, however, it fails to detect 30% of breast cancers with a number of missed cancers being visible on the image [1-6]. BREAST will monitor the mistakes, identify reasons for mammographic errors, and facilitate innovative solutions to reduce error rates. The BREAST platform has the potential to enable expert assessment of breast imaging innovations, anywhere in the world where experts or innovations are located. Currently, innovations are often being assessed by limited numbers of individuals who happen to be geographically located close to the innovation, resulting in equivocal studies with low statistical power. BREAST will transform this current paradigm by enabling large numbers of experts to assess any new method or technology using our embedded evaluation methods. We are confident that this world-first system will play an important part in the future efficacy of breast imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.