PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12035, including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Applying computer-aided detection (CAD) generated quantitative image markers has demonstrated significant advantages than using subjectively qualitative assessment in supporting translational clinical research. However, although many advanced CAD schemes have been developed, due to heterogeneity of medical images, achieving high scientific rigor of “black-box” type CAD schemes trained using small datasets remains a big challenge. In order to support and facilitate research effort and progress of physician researchers using quantitative imaging markers, we investigated and tested an interactive approach by developing CAD schemes with interactive functions and visual-aid tools. Thus, unlike fully automated CAD schemes, our interactive CAD tools allow users to visually inspect image segmentation results and provide instruction to correct segmentation errors if needed. Based on users’ instruction, CAD scheme automatically correct segmentation errors, recompute image features and generate machine learning-based prediction scores. We have installed three interactive CAD tools in clinical imaging reading facilities to date, which support and facilitate oncologists to acquire image markers to predict progression-free survival of ovarian cancer patients undergoing angiogenesis chemotherapies, and neurologists to compute image markers and prediction scores to assess prognosis of patients diagnosed with aneurysmal subarachnoid hemorrhage and acute ischemic stroke. Using these ICAD tools, clinical researchers have conducted several translational clinical studies by analyzing several diverse study cohorts, which have resulted in publishing seven peer-reviewed papers in clinical journals in the last three years. Additionally, feedbacks from physician researchers also indicate their increased confidence in using new quantitative image markers and help medical imaging researchers further improve or optimize interactive CAD tools.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automated quantification of intracranial artery morphology facilitates the identification of risk predictors for aneurysm development, as well as the detection of aneurysms. One established risk factor is the diameter of intracranial arteries which are arranged in a circulatory anastomosis, called the Circle of Willis (CoW). In this work, we assessed the performance of manual and automated measurement of intracranial artery diameters. Au- tomated measurements were obtained using a full-width-at-half-maximum (FWHM) approach. We investigated intra-and inter-rater variability, and compared manual to automatically obtained diameter measurements. The displacement error of manual measurement was assessed as another source of intra-rater variation. We used Bland–Altman plots and intra-class correlation coefficient (ICC) for analysis. Overall, the assessment revealed acceptable variation in intra- and inter-rater variability with no proportional bias. The median displacement error for repeated manual measurements (intra-rater) was 0.55 mm (IQR 0.24 – 1.06 mm). Good agreement was found between manual and automated measurements, with an ICC value of 0.76 (p<0.05, Pearson). These findings have implications for future assessment of intracranial artery diameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the last 5 to 10 years, there has been an enormous increase in the interest and use of network models in imaging. These are being considered for numerous imaging applications, including denoising, decision support, learned-feature selection, and many others. Network models “learn” solutions to imaging problems from labelled training data and an elaborate training regime. When a successful model is developed, it represents a computational algorithm for performing some task of interest. But it also encodes a solution to an imaging problem that may be intractable by conventional analytical means. Network models are therefore of interest for how they formulate a solution to a problem of interest. This work focuses on that process. We present two case studies in the analysis of neural networks. The first consists of a denoising network for digital breast tomosynthesis (DBT) images developed using a complex anatomical simulation of breast tissues and realistic x-ray transport physics. The second looks at a lesion detection network, also for DBT images, based on the same anatomical simulation model. For the denoising network, we find that it is very well represented by a linear operation that is effectively a Gaussian convolution kernel. The detection filter appears to be locally linear, but the filter profile appears to depend on what stimulus is used to probe the network. There does not appear to be any clear structure in quadratic components from reverse correlation. Overall, this study shows how regression and reverse-correlation techniques can be used to analyze network models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is a tremendous potential for AI-based quantitative imaging biomarkers to make clinical trials with standardof- care CT more efficient. There is, however, a well-recognized gap between discovery and the translation to practice for AI-based imaging biomarkers. Our goal is to enable more efficient and effective imaging clinical trials by characterizing the repeatability and reproducibility AI-based imaging biomarkers. We used virtual imaging clinical trials (VCTs) to simulate the data pathway by estimating the probability distributions functions for patient-, disease-, and imaging-related sources of variability. We evaluated the bias and variance in estimating the volume of liver lesions and the variance of an algorithm, that has shown success in predicting mortality risk for NSCLC patients. We used the volumetric XCAT anthropomorphic simulated phantom with inserted lesions with varied shape, size, and location. For CT acquisition and reconstruction we used the CatSim package and varied acquisition mAs and image reconstruction kernel. For each combination of parameters we generated 20 independent realizations with quantum and electronic noise. The resulting images were analyzed with the two AI-based imaging biomarkers described above, and from that we computed the mean and standard deviation of the results. Mean values and/or bias results were counter-intuitive in some cases, e.g. lower mean bias in scans with lower mAs. Addition of variations in lesion size, shape and location increased variance of the estimated parameters more than the mAs effects. These results indicate the feasibility of using VCTs to estimate the repeatability and reproducibility of AI-based biomarkers used in clinical trials with standard-of-care CT.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is increasing interest in using deep learning and computer vision to help guide clinical decisions, such as whether to order a biopsy based on a mammogram. Existing networks are typically black box, unable to explain how they make their predictions. We present an interpretable deep-learning network which explains its predictions in terms of BI-RADS features mass shape and mass margin. Our model predicts mass margin and mass shape, then uses the logits from those interpretable models to predict malignancy, also using an interpretable model. The interpretable mass margin model explains its predictions using a prototypical parts model. The interpretable mass shape model predicts segmentations, fits an ellipse, then determines shape based on the goodness of fit and eccentricity of the fitted ellipse. While including mass shape logits in the malignancy prediction model did not improve performance, we present this technique as part of a framework for better clinician-AI communication.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose Digital breast tomosynthesis (DBT) exhibits increased sensitivity and specificity compared to 2D mammography (DM), but DBT images are complex and interpretation takes longer. Clinicians may fatigue or hit a cognitive limit sooner when reading DBT, potentially reducing diagnostic accuracy. Eye blink behaviour was investigated to explore fatigue and cognitive load. Methods Screeners (N=47) from five UK breast screening centres were eye tracked as they read 40 DBT cases (15 normal, 6 benign and 19 malignant), from November 2019-July 2021. Differences in diagnostic accuracy and blink behaviour were analysed over the course of the reading session. Blink rates and case durations were investigated by case malignancy and outcome using T-tests and ANOVAs (α=0.05). Results Blink rates were higher on malignant cases than on normal cases (p=0.004), and blink rates were higher for cases with true positive outcomes than for cases with true negative outcomes (p=0.013). Participants spent less time on malignant cases than normal or benign cases (ps=<0.0001), whilst spending more time on cases with a false positive outcome than on cases with a true negative or true positive outcome (ps<0.0001). No significant difference in blink rate or diagnostic performance by time through reporting session. Conclusion Differences in blink rate and time on case are associated with case malignancy and outcome, potentially reflecting varying cognitive demand and interpretation strategies. Further investigation into blinking during medical image interpretation may identify robust signals of cognition and fatigue that could be used for education and training purposes, whilst indicating optimal screening session duration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: Medical errors account for a third leading cause of death in the United States. Despite recent interventions, high error rates persist. Satisfaction of search (SOS) is a relatively less harmful type of bias that indicates an individual’s decreased vigilance and/or awareness of additional abnormalities after the first abnormality has been identified. We studied SOS data in clinical practice and tried to correlate its best fit in clinical trial setting reads where diagnosis is typically already known.
Methods: SOS data from four different clinical trials including 8036 timepoints with assessments across 1655 subjects were reviewed by board-certified radiologist reviewers using response evaluation criteria in solid tumors (RECIST) 1.1 criteria and analyzed for new lesion identification.
Results: We analyzed specific subset of subjects with progressive disease which usually is the critical clinical trial endpoint in oncology. We noticed that once progressive disease was detected by the radiologist reviewer, additional new lesions tend to be not marked or missed out on a statistically significant proportion. This might not be due to the incompetence of the reviewer but due to SOS error where satisfaction was reached on finding progressive disease, the trial endpoint analogous with the first abnormality in clinical practice.
Conclusions: With SOS, once an abnormality is detected and recognized, it requires additional attention to look for other possible abnormalities within an image. Additional abnormalities may be missed by the radiologist once the first abnormality is found. Several strategies can be used to mitigate SOS which includes the use of a systematic approach to ensure all relevant findings are identified, through a secondary search once the first finding is reported.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep neural networks used for reconstructing sparse-view CT data are typically trained by minimizing a pixel- wise mean-squared error or similar loss function over a set of training images. However, networks trained with such losses are prone to wipe out small, low-contrast features that are critical for screening and diagnosis. To remedy this issue, we introduce a novel training loss inspired by the model observer framework to enhance the detectability of weak signals in the reconstructions. We evaluate our approach on the reconstruction of synthetic sparse-view breast CT data, and demonstrate an improvement in signal detectability with the proposed loss.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Humans process visual information with varying resolution (foveated visual system) and explore images by orienting through eye movements the high-resolution fovea to points of interest. The Bayesian ideal searcher (IS) that employs complete knowledge of task-relevant information optimizes eye movement strategy and achieves the optimal search performance. The IS can be employed as an important tool to evaluate the optimality of human eye movements, and potentially provide guidance to improve human observer visual search strategies. Najemnik and Geisler (2005) derived an IS for backgrounds of spatial 1/f noise. The corresponding template responses follow Gaussian distributions and the optimal search strategy can be analytically determined. However, the computation of the IS can be intractable when considering more realistic and complex backgrounds such as medical images. Modern reinforcement learning methods, successfully applied to obtain optimal policy for a variety of tasks, do not require complete knowledge of the background generating functions and can be potentially applied to anatomical backgrounds. An important first step is to validate the optimality of the reinforcement learning method. In this study, we investigate the ability of a reinforcement learning method that employs Q-network to approximate the IS. We demonstrate that the search strategy corresponding to the Q-network is consistent with the IS search strategy. The findings show the potential of the reinforcement learning with Q-network approach to estimate optimal eye movement planning with real anatomical backgrounds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The diagnostic performance of radiologist readers exhibits substantial variation that cannot be explained by CT acquisition protocol differences. Studying reader detectability from CT images may help identify why certain types of lesions are missed by multiple or specific readers. Ten subspecialized abdominal radiologists marked all suspected metastases in a multi-reader-multi-case study of 102 deidentified contrast-enhanced CT liver scans at multiple radiation dose levels. A reference reader marked ground truth metastatic and benign lesions with the aid of histopathology or tumor progression on later scans. Multi-slice image patches and 3D radiomic features were extracted from the CT images. We trained deep convolutional neural networks (CNN) to predict whether an average (generalized) or individual radiologist reader would detect or miss a specific metastasis from an image patch containing it. The individualized CNN showed higher performance with an area under the receiver operating characteristic curve (AUC) of 0.82 compared to a generalized one (AUC = 0.78) in predicting reader-specific detectability. Random forests were used to build the respective versions from radiomic features. Both the individualized (AUC = 0.64) and generalized (AUC = 0.59) predictors from radiomic features showed limited ability to differentiate detected from missed lesions. This shows that CNN can identify and learn automated features that are better predictors of reader detectability of lesions than radiomic features. Individualized prediction of difficult lesions may allow targeted training of idiosyncratic weaknesses but requires substantial training data for each reader.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multiple objective assessment of image-quality (OAIQ)-based studies have reported that several deep-learning (DL)-based denoising methods show limited performance on signal-detection tasks. Our goal was to investigate the reasons for this limited performance. To achieve this goal, we conducted a task-based characterization of a DLbased denoising approach for individual signal properties. We conducted this study in the context of evaluating a DL-based approach for denoising single photon-emission computed tomography (SPECT) images. The training data consisted of signals of different sizes and shapes within a clustered-lumpy background, imaged with a 2D parallel-hole-collimator SPECT system. The projections were generated at normal and 20% low-dose level, both of which were reconstructed using an ordered-subset-expectation-maximization (OSEM) algorithm. A convolutional neural network (CNN)-based denoiser was trained to denoise the low-dose images. The performance of this CNN was characterized for five different signal sizes and four different signal-to-background ratio (SBRs) by designing each evaluation as a signal-known-exactly/background-known-statistically (SKE/BKS) signal-detection task. Performance on this task was evaluated using an anthropomorphic channelized Hotelling observer (CHO). We observed that the DL-based denoising approach did not improve performance on the signal-detection task for any of the signal types. None of signal types did the DL-based denoising approach improve performance. Further investigations revealed a drop in the mean of the differences between the signal-present and the signal-absent images, resulting in this limited performance on detection tasks. Additionally, it was observed that evaluation with fidelity-based figures of merit (root mean square error and structural similarity index) directly contradicted the observer study findings for all signals. Overall, these results provide new insights on the performance of the DL-based denoising approach as a function of signal size and contrast. More generally, the observer study-based characterization provides a mechanism to evaluate the sensitivity of the method to specific object properties, and may be explored as analogous to characterizations such as modulation transfer function for linear systems. Finally, this work underscores the need for objective task-based evaluation of DL-based denoising approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The most frequently used model for simulating multireader multicase (MRMC) data that emulate confidence-of-disease ratings from diagnostic imaging studies has been the Roe and Metz model, proposed by Roe and Metz in 1997 and later generalized by Hillis (2012), Abbey et al (2013) and Gallas and Hillis (2014). All of these models generate continuous confidence-of-disease ratings based on an underlying binormal model for each reader, with the separation between the normal and abnormal rating distributions varying across readers.
Numerous studies have used these models for evaluating MRMC analysis and sample size methods. The models suggested in these papers for assessing type I error have been "null" models, where the expected AUC across readers is the same for each test. However, for the null models that have been suggested, there are other differences that would not exist if the two tests were identical.
None of the papers cited above discuss how to formulate a null model that is also an "identical-test" model, where the two tests are identical in all respects. The purpose of this paper is to show how to formulate an identical-test model and to discuss the importance of this model. Using the identical-test model, I show through simulations the importance of the Obuchowski-Rockette model constraints to avoid a negative variance estimate, a result which had not previously been empirically demonstrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional effort to accelerate medical imaging machine intelligence research and create a publicly available image repository/commons as well as a sequestered database for performance evaluation and benchmarking of algorithms. After de-identification, approximately 80% of the medical images and associated meta-data will become part of the open repository and 20% will be sequestered and kept separate from the open commons. To ensure that both the public, open dataset and the sequestered dataset are representative of the population available, demographic characteristics across the two datasets must be balanced. Our method uses multidimensional stratified sampling where several demographic variables of interest are sequentially used to separate the data into individual strata, each representing a unique combination of variables. Within each stratum, patients are randomly assigned to the open set (80%) or the sequestered set (20%). Thus, for p variables of interest, the balance of the pdimensional distribution of variable combinations can be controlled. This algorithm was used on an example COVID-19 dataset containing image exams of 4662 patients using the variables of race, age, sex at birth, and ethnicity, each containing 8, 8, 2, and 4 categories, respectively. After stratification of this dataset into the two subsets, resulting distributions of each variable matched the distribution from the original dataset with a maximum percent difference from its original fraction of 0.4%. These results demonstrate that the implemented process of multi-dimensional sequential stratified sampling can partition a large database while maintaining balance across several variables.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A common study design for comparing the performances of diagnostic imaging tests is to obtain ratings from multiple readers of multiple cases whose true statuses are known. Typically, there is overlap between the tests, readers, and/or cases for which special analytical methods are needed to perform statistical comparisons. We present our new MATLAB MRMCaov toolbox, which is designed for multi-reader multi-case comparisons of two or more diagnostic tests. The toolbox allows for statistical comparison of reader performance metrics, such as area under the receiver operating characteristic curve (ROC AUC), with analysis of variance methods originally proposed by Obuchowski and Rockette (1995) and later unified and improved by Hillis and colleagues (2005, 2007, 2008, 2018). MRMCaov is open-source software with an integrated command-line interface for performing multi-reader multi-case statistical analysis, plotting, and presenting results. Its features (1) ROC AUC, likelihood ratios of positive or negative ratings, sensitivity, specificity, and expected utility reader performance metrics; (2) reader-specific ROC curves; (3) user-definable performance metrics; (4) test-specific estimates of mean performance along with confidence intervals and p-values for statistical comparisons; (5) support for factorial, nested, or partially paired study designs; (6) inference for random or fixed readers and cases; (7) DeLong, jackknife, or unbiased covariance estimation; and (8) compatibility with Microsoft Windows, Mac OS, and Linux.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current literature has described the usefulness of the DBT in addition to FFDM because of the increase in cancer detection and decrease in recall rates. The primary limitations of using FFDM plus DBT for screening are the rise in radiation dose, which approximately doubles if both modalities are used. Subsequently, synthesized two-dimensional views can be reconstructed from DBT slices with the ideal to replace FFDM. Although many studies have explored the value of DBT in addition to FFDM, little attention is given to the effectiveness that synthesized views might bring to the radiologists as a supplement view for DBT. The aim of this study is to investigate the diagnostic accuracy of radiology trainees with DBT only compared with DBT plus the synthesized view (C-View). Twenty radiology trainees were asked to report a set of 35 two-projection DBT images of left and right breasts (15 were cancer cases). Another group of 8 trainees read the same DBT set with the addition of the C-View. Participants searched for the presence of lesions within the cases using the Tabar RANZCR system where 2 represented a benign lesion; 3-5 represented the suspicion of a malignancy with a higher value indicating a higher malignant possibility. The readers’ performances were evaluated via specificity, sensitivity, lesion sensitivity, ROC and JAFROC between two reading modes. The results demonstrated diagnostic metrics of participants were not significantly different in reading DBT only compared with the group reading DBT plus synthesized view (P<0.05). This finding implies that viewing DBT only could be equivalent to DBT plus C-View for radiology trainees.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual grading characteristic (VGC) analysis was used to investigate the performance of interventional radiologists and interventional radiographers when assessing uterine artery embolisation (UAE) image quality. The observers rated the image quality of 20 randomised DSA (digital subtraction angiography) series using a five-point rating scale, which compared Group A (optimised UAE radiation dose; n = 50) with a reference Group B (control group; n = 50). VGC analysis resulted in an area under the VGC curve (AUCVGC) of 0.52 for interventional radiologists (P = 0.83) and 0.55 for interventional radiographers (P = 0.61). Radiation dose reduction had no effect on observer image quality assessments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: Fatigue may lead to high medical errors by the radiologists. Fatigue is often described as feelings of weakness, lack of energy, and a desire to rest, and is associated with impairments in the ability to function. Visual fatigue has importance in medical imaging as errors (false-negatives) are relatively common. Blinded independent central review (BICR) is a well-used method employed in many oncology registration trials. Ongoing monitoring of radiologist “reviewer” performance is both good clinical trial practice and a requirement by regulatory authorities. We use reader disagreement index (RDI) as a potential tool to identify reader fatigue and compare reader fatigue in reviewers performing single versus multiple types of study.
Methods: A retrospective analysis of reviewers’ RDI in four different clinical trials were performed. Fourteen reviewers’ performance was analyzed with data for 3750 subjects having a total of 15105 timepoints across all clinical trials. These individual trial reviews were conducted by 14 board-certified radiologist reviewers using several established imaging assessment trial criteria. The objective of the study was to establish RDI as an effective tool to analyze if it could be a good surrogate marker for quality impacted by reader fatigue.
Results: The results indicate the RDI can be used as a tool to track reader quality which in turn may be able to predict reader fatigue. In the random pool of readers and studies analyzed, we did not notice any major trend or impact on read quality given that these trials were anyway actively monitored for read volume distribution and quality.
Conclusions: Fatigue may lead to high medical errors by the radiologists. RDI can be used as a good surrogate for read quality to monitor reader fatigue. Based on the results, it can be said that it is better to undertake more cases in a single study than undertake less number of cases in different types of studies to prevent reader fatigue.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical imaging is a complex field driven by advances in technology, yet without the human observer to make sense of the images acquired and convey to other health care providers and patients their perceptions of what those images reveal about disease processes, there is no impact on patient health and outcomes. The evolving role of medical image perception research, its impact on technology development, deployment, and use, where future opportunities exist for medical image perception investigations, and what types of collaborations may further enhance our visibility and impact clinically will continue to play a critical role in improving the efficacy and efficiency of healthcare and patient outcomes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Breast density is an important breast cancer risk factor related to decreased mammography sensitivity and as an independent risk factor. This research aims to establish the distribution of breast density in the Saudi screening population and to identify the relationship between visual and automated breast density methods. Screening mammograms from 2905 cancer-free women were retrospectively collected from the Saudi National Breast Cancer Screening Programme. Breast density of screening mammograms were assessed visually by 11 radiologists using the Breast Imaging and Reporting Data System (BIRADS) 5th edition and Visual Analogue Scale (VAS), and by automated methods; predicted VAS processed (pVASprocessed), predicted VAS raw (pVASraw) and VolparaTM. The relationship between breast density methods was assessed using the intra-class coefficient (ICC) and weighted kappa (κ). Results indicated that around one-third of Saudi women of screening age had high breast density (BI-RADS C/D: 31.5% or Volpara Density Grade (VDG) C/D: 29.0%). Full screening mammograms from 1022 women were used to assess the relationship between all methods. Predicted VAS estimates of percent density were generally lower than VAS. The highest ICC was between VAS and pVASraw (ICC=0.86, 95% CI 0.84-0.88). For categorical breast density methods, VDG 5th edition showed fair agreement with BI-RADS 5th edition (κ=0.35, 95% CI 0.29-0.39). In conclusion, this study shows the majority of Saudi women of screening age have low breast density as shown by visual and automated methods, and there is a positive relationship between visual and automated methods, being strongest for VAS and pVASraw.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective evaluation of quantitative imaging (QI) methods with patient data is highly desirable, but is hindered by the lack or unreliability of an available gold standard. To address this issue, techniques that can evaluate QI methods without access to a gold standard are being actively developed. These techniques assume that the true and measured values are linearly related by a slope, bias, and Gaussian-distributed noise term, where the noise between measurements made by different methods is independent of each other. However, this noise arises in the process of measuring the same quantitative value, and thus can be correlated. To address this limitation, we propose a no-gold-standard evaluation (NGSE) technique that models this correlated noise by a multi-variate Gaussian distribution parameterized by a covariance matrix. We derive a maximum-likelihood-based approach to estimate the parameters that describe the relationship between the true and measured values, without any knowledge of the true values. We then use the estimated slopes and diagonal elements of the covariance matrix to compute the noise-to-slope ratio (NSR) to rank the QI methods on the basis of precision. The proposed NGSE technique was evaluated with multiple numerical experiments. Our results showed that the technique reliably estimated the NSR values and yielded accurate rankings of the considered methods for 83% of 160 trials. In particular, the technique correctly identified the most precise method for ∼ 97% of the trials. Overall, this study demonstrates the efficacy of the NGSE technique to accurately rank different QI methods when the correlated noise is present, and without access to any knowledge of the ground truth. The results motivate further validation of this technique with realistic simulation studies and patient data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Printed phantoms hold great potential as a tool for examining task-based image quality of x-ray imaging systems. Their ability to produce complex shapes rendered in materials with adjustable attenuation coefficients allows a new level of flexibility in the design of tasks for the evaluation of physical imaging systems. We investigate performance in a fine “boundary discrimination” task in which fine features at the margin of a clearly visible “lesion” are used to classify the lesion as malignant or benign. These tasks are appealing because of their relevance to clinical tasks, and because they typically emphasize higher spatial frequencies relative to more common lesion detection tasks. A 3D printed phantom containing cylindrical shells of varying thickness was used to generate lesions profiles that differed in their edge profiles. This was intended to approximate lesions with indistinct margins that are clinically associated with malignancy. Wall thickness in the phantom ranged from 0.4mm to 0.8mm, which allows for task difficulty to be varied by choosing different thicknesses to represent malignant and benign lesions. The phantom was immersed in a tub filled with water and potassium phosphate to approximate the attenuating background, and imaged repeatedly on a benchtop cone-beam CT scanner. After preparing the image data (reconstruction, ROI Selection, sub-pixel registration), we find that the mean frequency of the lesion profile is 0.11 cyc/mm. The mean frequency of the lesion-difference profile, representative of the discrimination task, is approximately 6 times larger. Model observers show appropriate dose performance in these tasks as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern generative models, such as generative adversarial networks (GANs), hold tremendous promise for several areas of medical imaging, such as unconditional medical image synthesis, image restoration, reconstruction and translation, and optimization of imaging systems. However, procedures for establishing stochastic image models (SIMs) using GANs remain generic and do not address specific issues relevant to medical imaging. In this work, canonical SIMs that simulate realistic vessels in angiography images are employed to evaluate procedures for establishing SIMs using GANs. The GAN-based SIM is compared to the canonical SIM based on its ability to reproduce those statistics that are meaningful to the particular medically realistic SIM considered. It is shown that evaluating GANs using classical metrics and medically relevant metrics may lead to different conclusions about the fidelity of the trained GANs. This work highlights the need for the development of objective metrics for evaluating GANs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Translation of CAD-AI Methods to Clinical Practice: Are We There Yet? Joint Session with Conferences 12033 and 12035
The performance of Deep Learning (DL) segmentation algorithms is routinely determined using quantitative metrics like the Dice score and Hausdorff distance. However, these metrics show a low concordance with humans’ perception of segmentation quality. The successful collaboration of health care professionals with DL segmentation algorithms will require a detailed understanding of experts’ assessment of segmentation quality. Here, we present the results of a study on expert quality perception of brain tumor segmentations of brain MR images generated by a DL segmentation algorithm. Eight expert medical professionals were asked to grade the quality of segmentations on a scale from 1 (worst) to 4 (best). To this end, we collected four ratings for a dataset of 60 cases. We observed a low inter-rater agreement among all raters (Krippendorff’s alpha: 0.34), which potentially is a result of different internal cutoffs for the quality ratings. Several factors, including the volume of the segmentation and model uncertainty, were associated with high disagreement between raters. Furthermore, the correlations between the ratings and commonly used quantitative segmentation quality metrics ranged from no to moderate correlation. We conclude that, similar to the inter-rater variability observed for manual brain tumor segmentation, segmentation quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences. Clearer guidelines for quality evaluation could help to mitigate these differences. Importantly, existing technical metrics do not capture clinical perception of segmentation quality. A better understanding of expert quality perception is expected to support the design of more human-centered DL algorithms for integration into the clinical workflow.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A Computer-Aided Triage and Notification (CADt) device uses artificial intelligence (AI) to prioritize radiological medical images and speed up reviews of diseased cases in time-sensitive conditions such as stroke, intercranial hemorrhage, and pneumothorax. However, questions remain on the quantitative assessment of the clinical effectiveness of CADt devices for speeding the review of patient images with time-sensitive conditions. This work presents an analytical method based on queueing theory to quantify the wait-time-savings and to study the impacts of CADt in various clinical settings. Theoretical results are consistent with clinical intuition and are verified by Monte Carlo simulations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The initial impressions about the presence of abnormality (or gist signal) from some radiologists are as accurate as decisions made following normal presentation conditions while the performance from others is only slightly better than chance-level. This study investigates if there is a subset of radiologists (i.e., “super-gisters”), whose gist signal is more reliable and consistently more accurate than others. To measure the gist signal, images were presented for less than a half-second. We collected the gist signals from thirty-nine radiologists, who assessed 160 mammograms twice with a wash-out period of one month. Readers were categorized as “super-gisters” and “others” by fitting a mixture of Gaussian models to the average Area Under Receiver Operating Characteristics curve (AUC) values of radiologists in two rounds. The median intra-class correlation (ICC) for the “supergisters” was 0.63 (IQR: 0.51-0.691) while the median ICC for the “others” was 0.51 (IQR: 0.42-0.59). The difference between the two groups was significant (p=0.015). The number of mammograms interpreted by the radiologist per week did not differ significantly between “super-gisters” and others (medians of 237 versus 200, p=0.336). The linear mixed model, which treated both case and reader as random variables showed that only “super-gisters” can perceive the gist of the abnormal on negative prior mammograms, from women who developed breast cancer. Although detecting gist signal is noisy, a sub-set of readers have the superior capability in detecting the gist of the abnormal and only the scores given by them are useful and reliable for predicting future breast cancer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mammographic test sets are a simulation-based training methodology for radiologists to assess and improve their performance. However, while test-set records have indicated over-time improvements in participants' performance within the tests, little is known about how those improvements translate into breast-screening readers’ performance in the clinic. This study investigated how the performance of readers who completed test-set training in the BreastScreen Reader Assessment Strategy (BREAST) platform have evolved in comparison to readers who have no history of test-set participation. Investigating 10-year clinical audit data of 46 breast screening readers in New South Wales, Australia indicated that BREAST readers improved their positive predictive value (PPV) (p=0.001) in association with their testset participation. They also had higher detection rates for invasive cancers (p=0.01), ductal carcinoma in situ (DCIS) (p=0.03), and the detection rate of all cancers and DCIS (p=0.01). In comparison, non-BREAST readers improved their recall rate in subsequent screens (p=0.03) and PPV (p=0.02). In conclusion, test-set participation is linked to enhanced capability of cancer detection, which can be due to the high proportion of cancer cases in the test sets in comparison to normal practice.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the benefits of expertise is thought to be the ability to reduce complex data to the information that is most relevant to the task at hand. In radiology, this ability manifests as fewer fixations and shorter dwell time in anatomical regions that are considered irrelevant to the observer’s task. Although these findings are generally viewed as an advantage of expertise, this study explored the potential negative effects of top-down guidance when cases had abnormalities that were inconsistent with the observer’s expectations (i.e., incidental findings). 37 radiologists evaluated abdominal CT scans. One group was told the patients were living liver donor candidates and the other group was told they were living kidney donor candidates. Critically, two of the cases had liver abnormalities and two had kidney abnormalities. Overall, abnormalities in the uncued organ were missed ~6% more than in the cued organ, but this difference was not significant and Bayes Factors were inconclusive. Using eyetracking measures, which provide a more sensitive measure of search behavior, we found the uncued organ was searched less thoroughly than the cued organ. There was no significant difference in scanning/drilling behavior between groups. There was no relationship between experience and missed incidental finding rates. Furthermore, radiologists across all levels of experience were equally likely to focus less attention on the uncued organ. Although previous research has found group-level differences between experts and naïve observers on incidentalfinding rates1, these findings add to growing evidence that expertise does not protect experts from missing incidentalfindings2.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Radiomics and deep transfer learning have been attracting broad research interest in developing and optimizing CAD schemes of medical images. However, these two technologies are typically applied in different studies using different image datasets. Advantages or potential limitations of applying these two technologies in CAD applications have not been well investigated. This study aims to compare and assess these two technologies in classifying breast lesions. A retrospective dataset including 2,778 digital mammograms is assembled in which 1,452 images depict malignant lesions and 1,326 images depict benign lesions. Two CAD schemes are developed to classify breast lesions. First, one scheme is applied to segment lesions and compute radiomics features, while another scheme applies a pre-trained residual net architecture (ResNet50) as a transfer learning model to extract automated features. Next, the same principal component algorithm (PCA) is used to process both initially computed radiomics and automated features to create optimal feature vectors by eliminating redundant features. Then, several support vector machine (SVM)-based classifiers are built using the optimized radiomics or automated features. Each SVM model is trained and tested using a 10-fold cross-validation method. Classification performance is evaluated using area under ROC curve (AUC). Two SVMs trained using radiomics and automated features yield AUC of 0.77±0.02 and 0.85±0.02, respectively. In addition, SVM trained using the fused radiomics and automated features does not yield significantly higher AUC. This study indicates that (1) using deep transfer learning yields higher classification performance, and (2) radiomics and automated features contain highly correlated information in lesion classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Characterization of case-based classification repeatability and variability of operating points can complement measures of classification performance in artificial intelligence/computer-aided diagnosis (AI/CADx). Building upon our previous work in this area using human-engineered radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, we investigated the application of these methods to features extracted from pre-trained convolutional neural networks using deep transfer learning. The second post-contrast DCE-MR images for 601 unique breast lesions (194 benign, 407 malignant) were cropped and resized for input into a VGG-19 network, pretrained using ImageNet. Features were extracted and average pooled from the five max-pool layers, resulting in 1,472 features for each lesion. The assignment of cases to training and test sets was varied using a 1000-iteration 0.632 bootstrap. Overall classification performance for distinguishing between malignant and benign cases (using the area under the receiver operating characteristic curve (AUC) with 0.632+ bootstrap correction), case-based classification repeatability (using repeatability profiles which measure the 95% confidence interval (CI) of classifier output across its range), and attainment of a ‘preferred’ target (95%) or ‘optimal’ sensitivity and specificity were investigated using a random forest classifier. The AUC (median, [95% CI]) was 0.862 [0.806, 0.899]. The repeatability profile and attained sensitivity and specificity were similar to previous results for both the ‘preferred’ and ‘optimal’ targets when using human-engineered radiomic features. These results demonstrate the application of these methods to complement AI/CADx model assessment when using deep transfer learning features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Since sensitivity of mammography is limited in detecting subtle cancer, we propose to develop a new computer-aided detection (CAD) scheme to generate a new quantitative image marker to predict risk of having mammography-occult cancer that can be detected by breast MRI. The study is based on the hypothesis that overall breast density and bilateral asymmetry of breast density between left and right breasts associate with higher risk of developing breast cancer in shortterm. Thus, a new CAD scheme is developed to process images and analyze bilateral asymmetry of mammographic density and tissue structure. From the computed image features, a machine learning model is trained to generate an image marker or likelihood score to predict risk of having mammography-occult tumors. In this presentation, we report two cases in which screening mammograms are rated as BIRADS 2 by radiologists. Two women do not qualify for breast MRI screening due to the lower risk scores predicted by existing epidemiology risk models. CAD scheme analyzes mammograms of these two cases and produces high risk scores of having mammography-occult tumors. After applying breast MRI screening, two mammography-occult tumors are detected. Biopsy results confirm one invasive ductal carcinoma (grade 3) and one high-risk tumor of solitary breast papillomas, which needs to be removed by surgery. This study demonstrates potential advantages of applying CAD-generated image marker to detect abnormality or predict cancer risk that are missed or overlooked by radiologists. It can thus increase efficacy of using MRI as an adjunct tool to mammography to detect more subtle cancers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural network (CNN) based denoisers have shown promising results in low-dose CT (LDCT) denoising. However, image blur is a problem that needs to be addressed because it deforms or eliminates small features, which interferes with the diagnosis. Pixel level loss, such as mean-squared-error (MSE) loss, used for CNN training is the cause of the image blur in the denoised image because the pixel level loss computes the average of all pixel value differences without attention on important features. To resolve the image blur, we propose to use an activation map for training CNN denoiser. The activation map indicates the area where the CNN classifier focuses for classifying the image. We train CNN classifier to classify lesion-present and lesion-absent CT images (i.e., binary detection task), and then, obtain activation map of image using trained CNN classifier. It is observed that lesions and edges of images are activated in activation map, and therefore, when the activation map is multiplied by image, small features are emphasized. We train CNN denoiser in two steps. First, we train CNN denoiser using LDCT and normal-dose CT (NDCT) image pairs. In the second step, we fine-tune network parameters of CNN denoiser using LDCT and NDCT image pairs multiplied by NDCT activation map. The two- step trained CNN denoiser effectively reduces noise while preserving small features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Radiologist-AI interaction is a novel area of research of potentially great impact. It has been observed in the literature that the radiologists’ performance deteriorates towards the shift ends and there is a visual change in their gaze patterns. However, the quantitative features in these patterns that would be predictive of fatigue have not yet been discovered. A radiologist was recruited to read chest X-rays, while his eye movements were recorded. His fatigue was measured using the target concentration test and Stroop test having the number of analyzed X-rays being the reference fatigue metric. A framework with two convolutional neural networks based on UNet and ResNeXt50 architectures was developed for the segmentation of lung fields. This segmentation was used to analyze radiologist’s gaze patterns. With a correlation coefficient of 0.82, the eye gaze features extracted lung segmentation exhibited the strongest fatigue predictive powers in contrast to alternative features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The implementation of architectures based on artificial intelligence and deep learning to support COVID-19 diagnosis has great potential. However, especially in architectures designed at the beginning of the pandemic, they use different databases that do not contain a good amount of chest X-ray images of COVID-19 patients. The present work presents a comparison of three deep learning architectures (COVID-Net, CovXNet and DarkCovidNet) for COVID-19 diagnosis using chest Xray images. First, the architectures were implemented with the databases provided by the authors, to compare the results with those presented in the state of the art. Then, a new database with more than 9000 chest X-ray images of patients with COVID-19, pneumonia and healthy (3305 images for each class), was elaborated using databases from four different institutions around the world. Finally, the database was used to evaluate the original architectures, retrain them and, finally, evaluate the performance of the retrained architectures and compare results. It was identified that the architectures with the best performance and generalizability are DarkCovidNet and CovXNet with a support vector machine stacking algorithm, with an accuracy of 94.04% and 92.02% respectively, for the test data of the new database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A variety of deep neural network (DNN)-based image denoising methods have been proposed for use with medical images. These methods are typically trained by minimizing loss functions that quantify a distance between the denoised image and the defined target image (e.g., a noise-free or low noise image). They have demonstrated high performance in terms of traditional image quality metrics such as root mean square error (RMSE), structural similarity index metric (SSIM), or peak signal-to-noise ratio (PSNR). However, it has been reported that these denoising methods may not improve the objective measures of image quality (IQ). In this work, a task-informed model training method that preserves task-specific information is established and systematically evaluated with clinical realistic simulated low-dose X-ray computed tomography (CT) images. Specifically, binary signal detection tasks under signal-known-statistically (SKS) with background-known-statistically (BKS) conditions are considered. The low-dose CT denoising networks are first pretrained by use of a mean-square-error (MSE) loss function. A fully connected layer with a sigmoid activation function is subsequently appended to the denoising network, which can be interpreted as a single layer neural network-based numerical observer (SLNN-NO). A hybrid loss function consisting of a binary cross-entropy loss function and mean square loss function is employed to jointly fine-tune the denoising network and train the SLNN-NO. The performance of the SLNN-NO on denoised data is quantified to evaluate the impact of the task-informed training procedure on the denoising network. The presented results indicate that the task-informed training method can improve observer performance while providing control over the trade off between traditional and task-based measures of image quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two common regularization methods in reconstruction of magnetic resonance images are total variation (TV) which restricts the magnitude of the gradient in the reconstructed image and wavelet sparsity which assumes that the object being imaged is sparse in the wavelet domain. These regularization methods have resulted in images with fewer undersampling artifacts and less noise but introduce their own artifacts. In this work, we extend previous results on modeling of human observer performance for images using TV regularization to also predict human detection performance using wavelet regularization and a combination of wavelet and TV regularization. Small lesions were placed in the coil k-space data for fluid-attenuated inversion recovery (FLAIR) brain images from the fastMRI database. The data was undersampled using an acceleration factor of 3.48. The undersampled data was reconstructed using a range of regularization parameters for both the TV and wavelet regularization. The internal noise level for the sparse difference-of-Gaussians (S-DOG) model observer was chosen to match the average human percent correct in two-alternative forced choice (2-AFC) studies with a signal known exactly with variable backgrounds and no regularization. The S-DOG model largely tracked the human observer results except at large values of the regularization parameter where it outperformed the average human observer. We found that the regularization with either constraint or in combination did not improve human observer performance for this task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the medical imaging field, task-based metrics of image quality have been advocated as a mean to evaluate the performance of imaging systems and/or reconstruction algorithms. One such way of obtaining these metrics is through a numerical observer. Although the Bayesian ideal observer is optimal by definition, it is frequently intractable and nonlinear. Therefore, linear approximations to the IO are sometimes used to obtain task-based statistics. The optimal linear observer for maximizing the signal-to-noise ratio (SNR) of the test statistic is the Hotelling Observer (HO). However, the computational cost for obtaining the HO increases with image size and becomes intractable for large scale images. In multimodal data, this further becomes an issue because each additional modality dramatically increases the size of the composite image. An alternative to obtaining the HO is approximating the test statistic using a feed-forward neural network (FFNN). However, these methods of learning the HO have not been evaluated on multi-modal data. In this work, a tractable learned multi-modal observer is implemented. The considered task is a signal-known-statistically/background known statistically binary signal detection task. A stylized operator representing an ultrasound computed tomography imaging system and numerical breast phantoms with speed of sound and attenuation modalities are considered. The considered signal is a microcalcification cluster with a random amplitude. It is demonstrated that the learned HO can closely approximate the HO for the considered task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Hotelling observer (HO) is a commonly used linear observer for detection or classification tasks. The conventional implementation operating on binned data normally involves inversion of covariance matrices and estimation of the difference in means of two vectors. However, the conventional calculation can’t be directly applied to list mode data. The situation is salvageable by using the attribute list to construct a Poisson point process in attribute space,1 which makes the computation of HO quite different. In this work, we present an example of computing the HO test statistic on list mode data. The observer performance is measured on a signalknown- exactly and background-known-statistically task. The receiver operating characteristic (ROC) curve of the HO on list mode data is compared to the corresponding approximation by use of supervised learning methods proposed in the paper2 on binned data, where a single-layer neural network (SLNN) is used to approximate the HO test statistic. The comparison shows that the HO on list mode data outperforms the binned data. The result demonstrates the fact again that list mode data contains more information comparing to its binned version.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of convolutional neural networks (CNNs) for establishing anthropomorphic numerical observers (ANOs) is being actively explored. In these data-driven approaches, CNNs are trained in a standard supervised way with human-labeled training data; hence, the anthropomorphic component of this procedure resides only in the training labels. However, it is well-known that such traditionally trained CNNs can rely on image features that are highly specific to the training distribution and may not align with features exploited by human perception. While being able to predict human observer performance under certain specified conditions, traditionally-trained CNNs lack the interpretability and robustness that may be desired for an ANO. To address this, in this work we investigate the use of an adversarial robust training strategy for training CNN-based observers. As recently demonstrated in the computer vision literature, this training strategy can result in CNNs that exploit more human-interpretable features than would be employed by a standard CNN. Robustly trained CNNs are systematically investigated for performing a signal-known-exactly (SKE) and background-known-statistically (BKS) binary detection task. Additionally, a differential evolution-based optimization procedure is developed to establish robustly trained CNNs that achieve a specified performance, which may provide a new approach to establishing ANOs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective image quality metrics (IQMs) are widely developed and utilized, considering that they can lead to optimal radiation doses in computed tomography (CT) imaging. However, how well these IQMs relate to a radiologist’s perception of subjective image quality, which is the gold standard for assessing diagnostic image quality, has not been fully explored. Therefore, in this study, we aim to analyze the relationship between subjective and objective quality metrics. We compared 13 full-reference and no-reference IQMs, including root mean square error, peak signal-to-noise ratio, structural similarity index (SSIM), multi-scale SSIM, information content weighted (IW)-SSIM, gradient magnitude similarity deviation, feature similarity index, noise quality metric, visual information fidelity, natural image quality evaluator, blind/referenceless image spatial quality evaluator, perception-based image quality evaluator, and the model observer non pre- whitening with eye filter (NPWE). The data used in this study were CT images under seven noise levels. The scores obtained from these data with the objective IQMs were then compared with the three radiologists’ scores by using Pearson linear correlation coefficient (PLCC) and Spearman’s rank order correlation coefficient (SROCC). The results show that SSIM performs best in terms of PLCC and SROCC but lacked some characteristics of the radiologists’ assessment. Full reference IQMs, except for IW-SSIM, generally outperform no-reference IQMs. No-reference IQMs show poor PLCC and SROCC scores, and the model observer NPWE shows the worst performance among them. These results may contribute to evaluating and developing IQMs with the preferences of radiologists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have built upon an existing freehand 3D ultrasound imaging technique to enable display-less scanning at a local site by novice users and remote reading by integrating an electromagnetic tracker with a 2D probe. Seventy-two volumes are generated using a reconstruction algorithm from data collected by three users in a single longitudinal sweep across a 23- week fetus phantom in four different configurations for six scan durations ranging from 5-s to 30-s. The acquisition is semi-blinded: the user knows the fetal orientation but scans without image display and guidance of a conventional scan. Three non-expert readers and one expert Radiologist extract the clinically relevant planes and measure four key biometric features from the 3D images. In this paper, we propose (1) a risk metric R to rate the quality of the scan as a function of probe motion and contact and (2) a measurability index M for the availability of the 2D planes within the volume and visibility of the biometric features. Our analysis shows that R is the lowest and M the highest for 15-s acquisitions corresponding to an average transducer sweep speed of 2.4-cm/s. The finding is consistent with a reported speed range of 3-4 cm/s recommended for a low cost teleradiology solution for 2D ultrasound. The errors in average biometric measurements compared to the 50th percentile values in the fetal biometry tables for corresponding gestational week are within -3.8 to 5.7%. R, M, accuracy and precision of measurements are useful indicators of performance of the 3D ultrasound system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study investigated whether radiologists from different countries share the same sensitivity to certain mammographic features. Retrospective data were collected from Chinese and Australian radiologists reading a high-density test set which contained 40 normal and 20 cancerous mammographic cases. Sixteen Australian radiologists, and 30 Chinese radiologists, including 18 from Nanchang and 12 from Hong Kong SAR/Shenzhen, were asked to read all images in this test set using the Royal Australian and New Zealand College of Radiologists (RANZCR) rating system and annotate the suspicious lesion(s). For each case and each radiologist group, the percentage of radiologists making the correct diagnoses was calculated. For cancer cases, we also calculated the percentage of radiologists who located the lesion correctly. Spearman correlation coefficient was used to explore the association between two radiologist groups. Data demonstrated a high correlation between Chinese and Australian radiologists in identifying cancer cases (r=0.839, p<0.0001), and locating lesions (r=0.802, p<0.0001), but no statistically significant relationship in identifying normal cases (r=0.236, p=0.142). However, between radiologists from two geographic regions of China, strong correlations were found in detecting cancer cases (r=0.686, p=0.0008), marking lesions (r=0.803, p<0.0001) and recognizing normal cases (r=0.562, p=0.0002). In conclusion, although Chinese and Australian radiologists may share the same difficulty in diagnosing and locating cancers, a difference in the challenge of identifying normal cases between them was shown. However, the performance by radiologists within China, although from different regions, remained consistent when reading high-density mammograms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Unlike Australia, China has no population-based early detection screening program with radiological expertise being a barrier to implementation. This study explores observer performance between breast radiologists from China and Australia, and the role of peer-assisted reading in Chinese radiologists’ performance. A test set of 60 high density screening mammograms (40 normal, 20 cancer cases) was constructed with eight Chinese and 17 Australian radiologists reading the test set independently, while another ten Chinese radiologists read the test set as a peer-duo, where discussion was encouraged but lesion marking was done separately. For independent readings by radiologists who read >20 cases per week, Chinese readers had lower performance in sensitivity, lesion sensitivity, AUC and JAFROC. There was no significant difference in performance between independent reading and peer-assisted reading Chinese readers and this strategy may have limited valued in improving diagnostic efficacy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning-based (DL) reconstruction has been introduced in CT, with two major manufacturers offering such methods in the clinic, which are trained mostly on patient data (or a combination of patient and phantom data). Our purpose was to investigate the influence of DL-based reconstruction on object detectability compared to the current standard of iterative reconstruction in CT head routine protocols, combining a model observer analyzing the detectability of lesion-like objects (brain, bone and lung tissue equivalent, 5 mm diameter, 25mm length) in a commercial anthropomorphic head phantom. The phantom was scanned 10 times in two CT systems (same manufacturer, different model) with routine head protocol and images reconstructed with FBP, iterative (IR) and deep-learning (DL) based methods. As input for the model observer, ROIs were subtracted centered on the locations of the cylinders and for each of them four background locations were selected nearby. The locations of the ROIs in the phantom were analogous for both scanners’ data. The non-prewhitening matched filter with an eye filter (NPWE) model observer was applied (Burgess eye filter, peak at 4 cy/deg, 50 cm eye-monitor distance). In visual inspection, the phantom brain background ROIs showed differences in noise texture between the reconstruction methods, with a more uniform distribution for DL-based methods in both CT systems. The average d’ and range was, for system 1: [lung-FBP: -124.9 (-178.2, -99.1); IR: -126.7 (-188.2; -102.9); DL:-136.2 (-181.9, -119.3)]; [bone-FBP: 206.7 (166.7, 269.7); IR: 215.4 (175.8, 278.1); DL: 268.3 (215.3, 339.5)]; soft tissue-FBP: -14.6 (-19.6, -9.8); IR: -15.5 (-20.7; -10.2); DL:-18.8 (-24.6, -10.6)]. The NPWE model obtained consistent higher d’ values in the DL-based reconstructed images compared to iterative and FBP for the three materials for both systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fluoroscopic imaging is utilized to dynamically image a patient’s internal anatomy and physiology during an examination. Current methods for the evaluation of fluoroscopic image performance do not challenge systems in real- time or with clinically meaningful tasks. This work presents a methodology for the task-specific quantification of clinical fluoroscopy systems’ imaging performance, through reader assessments of live fluoroscopic images. First, a set of clinically relevant tasks was developed based on the internationally recognized grading scale for kidney-ureter vesicoureteric reflux (VUR) in pediatric patients. Tasks were generated to represent VUR grades from 2 to 5, and were printed using iodine ink 2D printing. Tasks were described by the total number of pages, i.e. total iodine contrast, and the VUR grade of the task itself. In total, 24 combinations of contrast and grade were assessed. Images of each task were taken under three experimental conditions: first, under a high-dose flat panel detector clinical system; second, under a low-dose protocol on the same flat panel detector system; and third, under a comparable high-dose protocol in an image intensifier clinical system. Readers assessed imaging tasks in the clinical environment in two manners: 1) detection (VUR present or absent), and 2) identification of the VUR grade. The results of the reader study indicate that after the application of a scoring scheme, a metric quantifying task-performance of fluoroscopy systems may be obtained. The evaluation process outlined in this work will enable a standard mechanism for the quantitative comparison across fluoroscopic systems, technologies, and protocols.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Noise texture in CT images, commonly characterized by using the noise power spectrum (NPS), is mainly dictated by the shape of the reconstruction kernel. The peak frequency of the NPS (fpeak) is often used as a one-parameter metric for characterizing noise texture. However, if the downslope of the NPS beyond the fpeak influences noise texture visibly, then fpeak is insufficient as a single descriptor. Therefore, we investigated the human-detectable differences in NPSs having different fpeak and/or downslope parameters. NPSs were estimated using various reconstruction kernels on a commercial CT scanner. To quantify NPS downslope, half of a Gaussian function was fit through the NPS portion that lies beyond fpeak. The σ of this Gaussian was used as the downslope descriptor of the NPS. A two alternative forced choice observer study was performed to determine the just noticeable- differences (JND) in fpeak only, σ only, and both simultaneously. Visibility thresholds for these changes were determined and an elliptical limiting detectability boundary was determined. The JND threshold ellipse is centered on the reference values and has a major and minor radius of 0.47 lp/cm and 0.12 lp/cm, respectively. The major radius makes an angle of 143° with the x-axis. A change in only fpeak of 0.2 lp/cm is below the detection threshold. This number changes if the apodization part of the NPS changes simultaneously. In conclusion, both the peak frequency and the apodization section of the NPS influence the detectability of changes in image noise texture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
CT image quality is reliant on radiation dose, as low dose CT (LDCT) scans contain increased noise in images. This compromises the diagnostic performance on such scans. Therefore, it is desirable to perform Image Quality Assessment (IQA) prior to diagnostic use of CT scans. Often, image quality is assessed with full-reference methods, where a LDCT is algorithmically compared against its full dose counterpart. However due to health concerns, acquiring full dose CT scans is challenging and not desirable. As an alternative, non-reference IQA (NR-IQA) can be performed. Moreover, IQA at the pixel level is important, as most IQA methods only provide a global assessment, which means localized regions of interest cannot be specifically assessed. A solution for localized-IQA is to produce visually-interpretable quality maps. Deep learning methods could be employed by leveraging computer vision techniques, such as Self-Supervised learning (SSL). In this work, we propose Noise2Quality (N2Q)—a novel self-supervised, non-reference, pixel-wise image quality assessment model to predict IQA maps from LDCTs. Self-supervised dose level prediction as an auxiliary task further improves the model performance. Our experimental evaluation both qualitatively and quantitatively demonstrates the effectiveness of the model in accurately predicting IQA maps over various baselines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: This study aims to analyze a social distance monitoring and contact tracing assistance tool for preventing the spread of COVID-19 in a busy indoor working hospital environment. Method: A camera-based tool was developed. The tool estimates physical distance between multiple individuals in real-time and also tracks individuals and records their contact time when in violation social distance requirements for retrospect review. Both stereo- and monocular-camera tools are implemented and their accuracy and efficiency are evaluated and compared. Video was captured by a ZED M camera which was set close to the ceiling of a lab space. Three people within the field of view of the camera completed various movements. The distance (binary, <6 feet or >6 feet) and contact time between each pair was recorded as ground truth and compared to the video software analysis. Additionally, the contact time between any two individuals was calculated and compared to ground truth. Results: The overall accuracy of social distance detection was 95.1% and 74.4%, with a false-negative rate (when the tool predicts individuals are far enough apart, when they are actually too close) of 7.2% and 23.5% for the stereo and monocular tools, respectively. Conclusions: A stereo-camera social distance monitoring and contact tracing assistance tool can accurately detect social distance among multiple people, and keep an accurate contact record for each individual. While a monocular camera tool provided some level of certainty, a stereo camera tool was shown to be superior.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identification of abnormalities in radiology is predicated on one’s gestalt understanding of normal imaging findings. This study assesses whether perceptual training using high-volume chest radiography (HVCXR) can help develop an understanding of the normal appearance of a chest radiograph (CXR) and improve one’s ability to identify pulmonary nodules on CXR. Eight radiology residents were split into two groups where the experimental group received high volume chest radiography training, where they viewed 500 CXRs at the rate of 1 CXR every 3 seconds, while the control group did not. Both groups were then tasked to identify pulmonary nodules on a set of chest radiographs. Afterwards, the two groups switched interventions and worked on localizing pulmonary nodules on a third case set of chest radiographs. Performance at nodule identification was worse in the experimental and control groups after they had received HVCXR training, which was unexpected. We hypothesize that this decrease in performance was due to fatigue from the HVCXR intervention.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.