KEYWORDS: Receivers, Medical imaging, Space operations, Algorithm development, Medicine, Image compression, Current controlled current source, Electronic filtering
Despite theoretical and practical difficulties, we are attempting to extend receiver operating characteristic (ROC)
analysis to tasks with more than two classes. Previously we developed explicit analytical expressions for the
behavior of the ideal observer acting on univariate trinormal data, and for the region of support of the ideal
observer's decision variables when acting on bivariate trinormal data. Although explicit calculation of the ideal
observer's behavior for general underlying data is difficult, we have developed a new set of parameters for
describing the ideal observer's decision rule which may aid in analytic or numeric computation of the ideal
observer's behavior.
The development and application of multi-class BANN classifiers in computer-aided diagnosis methods motivated this
study in which we compared estimates produced by two-class and three-class BANN classifiers to true observations
drawn from simulated distributions. Observations were drawn from three Gaussian bivariate distributions with distinct
means and variances to generate G1, G2, and G3 simulated datasets. A two-class BANN was trained on each training
dataset for a total of ten different trained BANNs. The same testing dataset was run on each trained BANN. The average
and standard deviation of the resulting ten sets of BANN outputs were then calculated. This process was repeated with
three-class BANNS. Different sample numbers and values of a priori probabilities were investigated. The relationship
between the average BANN output and true distribution was measured using Pearson and Spearman coefficients, R-squared
and mean square error for two-class and three-class BANNs. There was significantly high correlation between
the average BANN output and true distribution for two-class and three-class BANNs; however, subtle non-linearities and
spread were found in comparing the true and estimated distributions. The standard deviations of two-class and three-class
BANNs were comparable, demonstrating that three-class BANNs can perform as reliably as two-class BANN
classifiers in estimating true distributions and that the observed non-linearities and spread were not simply due to
statistical uncertainty but were valid characteristics of the BANN classifiers. In summary, three-class BANN decision
variables were similar in performance to those of two-class BANNs in estimating true observations drawn from
simulated bivariate normal distributions.
Despite theoretical and practical difficulties, we are attempting to extend receiver operating characteristic (ROC)
analysis to tasks with more than two classes. Previously we investigated a univariate trinormal model for the
underlying data of a three-class ideal observer. Although analytically tractable, this is less realistic than a
multivariate data model. We have developed expressions for the region of support of the decision variable
probability density functions for bivariate trinormal underlying data, given certain constraints on the underlying
data covariance matrices. We hope these results will aid in developing computational methods for evaluating
observer performance under such a model.
We are attempting to extend receiver operating characteristic (ROC) analysis to tasks with more than two classes.
This is difficult because of the rapid increase in complexity, in general, of both observer behavior and evaluation
of its performance, as the number of classes involved increases. Many researchers have proposed addressing this
complexity by imposing simplifications on the model; for example, by using univariate data rather than the
bivariate data employed by the general three-class ideal observer. We have investigated a univariate trinormal
model for the underlying data of a three-class ideal observer. Although a reasonably complete description of the
ideal observer's behavior in this case is attainable, this behavior is more complicated than might intuitively be
expected.
We previously introduced a utility-based ROC performance metric, the "surface-averaged expected cost" (SAEC),
to address difficulties which arise in generalizing the well-known area under the ROC curve (AUC) to classification
tasks with more than two classes. In a two-class classification task, the SAEC can be shown explicitly to be
twice the area above the conventional ROC curve (1-AUC) divided by the arclength along the ROC curve. In
the present work, we show that in tasks comparing the performance of two observers whose behavior is described
by the proper binormal model, our proposed performance metric is consistent with AUC in the qualitative sense
of deciding which of the two observers is "better," and by how wide a margin.
The purpose of this study is to investigate three-class Bayesian artificial neural networks (BANN) in dynamic contrastenhanced
MRI (DCE-MRI) CAD in distinguishing different types of breast lesions including ductal carcinoma in situ
(DCIS), invasive ductal carcinoma (IDC), and benign. The database contains 72 DCIS lesions, 124 IDC lesions, and 131
benign breast lesions (no cysts). Breast MR images were obtained with a clinical DCE-MRI scanning protocol. In 3D,
we automatically segmented each lesion and calculated its characteristic kinetic curve using the fuzzy c-means method.
Morphological and kinetic features were automatically extracted, and stepwise linear discriminant analysis was utilized
for feature selection in four subcategories: DCIS vs. IDC, DCIS vs. benign, IDC vs. benign, and malignant (DCIS +
IDC) vs. benign. Classification was automatically performed with the selected features for each subcategory using
round-robin-by-lesion two-class BANN and three-class BANN. The performances of the classifiers were assessed with
two-class ROC analysis. We failed to show any statistically significant differences between the two-class BANN and
three-class BANN for all four classification tasks, demonstrating that the three-class BANN performed similarly to the
two-class BANN. A three-class BANN is expected to be more desirable in the clinical arena for both diagnosis and
patient management.
We previously introduced a utility-based ROC performance metric, the "surface-averaged expected cost" (SAEC),
to address difficulties which arise in generalizing the well-known area under the ROC curve (AUC) to classification
tasks with more than two classes. In a two-class classification task, the SAEC can be shown explicitly to be
twice the area above the conventional ROC curve (1-AUC) divided by the arclength along the ROC curve. In
the present work, we show that for a variety of two-class tasks under the binormal model, the SAEC obtained
for the proper decision variable (the likelihood ratio of the latent decision variable) is less than that obtained for
the conventional decision variable (i.e., using the latent decision variable directly). We also justify this result
using a readily derived property of the arclength along the ROC curve under a given data model. Numerical
studies as well as theoretical analysis suggest that the behavior of the SAEC is consistent with that of the AUC
performance metric, in the sense that the optimal value of this quantity is achieved by the ideal observer.
KEYWORDS: Electronic filtering, Data modeling, Medical imaging, Solids, Decision support systems, Performance modeling, Space operations, Radiology, Receivers, Tissues
We have shown previously that an obvious generalization of the area under an ROC curve (AUC) cannot serve
as a useful performance metric in classification tasks with more than two classes. We define a new performance
metric, grounded in the concept of expected utility familiar from ideal observer decision theory, but which
should not suffer from the issues of dimensionality and degeneracy inherent in the hypervolume under the ROC
hypersurface in tasks with more than two classes. In the present work, we compare this performance metric
with the traditional AUC metric in a variety of two-class tasks. Our numerical studies suggest that the behavior
of the proposed performance metric is consistent with that of the AUC performance metric in a wide range of
two-class classification tasks, while analytical investigation of three-class "near-guessing" observers supports our
claim that the proposed performance metric is well-defined and positive in the limit as the observer's performance
approaches that of the guessing observer.
KEYWORDS: Signal to noise ratio, Reconstruction algorithms, Image restoration, Signal detection, Tomography, CT reconstruction, Detection and tracking algorithms, Data modeling, Systems modeling, Computed tomography
Signal detection by the channelized Hotelling (ch-Hotelling) observer is studied for tomographic
application by employing a small, tractable 2D model of a computed tomography (CT) system.
The primary goal of this manuscript is to develop a practical method for evaluating the ch-Hotelling
observer that can generalize to larger 3D cone-beam CT systems. The use of the ch-Hotelling observer for evaluating tomographic image reconstruction algorithms is also demonstrated. For a realistic model for CT, the ch-Hotelling observer can be a good approximation to the ideal observer.
The ch-Hotelling observer is applied to both the projection data and the reconstructed images. The difference in signal-to-noise ratio for signal detection in both of these domains provides a metric for evaluating the image reconstruction algorithm.
We have shown in previous work that an ideal observer in a classification task with N classes achieves the optimal receiver operating characteristic (ROC) hypersurface in a Neyman-Pearson sense. That is, the hypersurface obtained by taking one of the ideal observer's misclassification probabilities as a function of the other N2-N-1 misclassification probabilities is never above the corresponding hypersurface obtained by any other observer. Due to the inherent complexity of evaluating observer performance in an N-class classification task with N>2, some researchers have suggested a generally incomplete but more tractable evaluation in terms of a hypersurface plotting only the N "sensitivities" (the probabilities of correctly classifying observations in the various classes). An N-class observer generally has up to N2-N-1 degrees of freedom, so a given sensitivity will still vary when the other N-1 are held fixed; a well-defined hypersurface can be constructed by considering only the maximum possible value of one sensitivity for each achievable value of the other N-1. We show that optimal performance in terms of this generally incomplete performance
descriptor, in a Neyman-Pearson sense, is still achieved by the N-class ideal observer. That is, the hypersurface obtained by taking the maximal value of one of the ideal observer's correct
classification probabilities as a function of the other N-1 is never below the corresponding hypersurface obtained by any other observer.
We analyzed a variety of recently proposed decision rules for
three-class classification from the point of view of ideal observer
decision theory. We considered three-class decision rules which have
been proposed recently: one by Scurfield, one by Chan et al., and one
by Mossman. Scurfield's decision rule can be shown to be a special
case of the three-class ideal observer decision rule in two different
situations: when the pair of decision variables is the pair of
likelihood ratios used by the ideal observer, and when the pair of
decision variables is the pair of logarithms of the likelihood ratios.
Chan et al. start with an ideal observer model, where two of the
decision lines used by the ideal observer overlap, and the third line
becomes undefined. Finally, we showed that the Mossman decision rule
(in which a single decision line separates one class from the other
two, while a second line separates those two classes) cannot be a
special case of the ideal observer decision rule. Despite the
considerable difficulties presented by the three-class classification
task compared with two-class classification, we found that the
three-class ideal observer provides a useful framework for analyzing a
wide variety of three-class decision strategies.
Bayesian artificial neural networks (BANNs) have proven useful in two-class classification tasks, and are claimed to provide good estimates of ideal-observer-related decision variables (the a posteriori class membership probabilities). We wish to apply the BANN methodology to three-class classification tasks for computer-aided diagnosis, but we currently lack a fully general extension of two-class receiver operating characteristic (ROC) analysis to objectively evaluate three-class BANN performance. It is well known that "the likelihood ratio of the likelihood ratio is the likelihood ratio." Based on this, we found that the decision variable which is the a posteriori class membership probability of an observational data vector is in fact equal to the a posteriori class membership probability of that decision variable. Under the assumption that a BANN can provide good estimates of these a posteriori probabilities, a second BANN trained on the output of such a BANN should perform very similarly to an identity function. We performed a two-class and a three-class simulation study to test this hypothesis. The mean squared error (deviation from an identity function) of a two-class BANN was found to be 2.5x10E-4. The mean squared error of the first component of the output of a three-class BANN was found to be 2.8x10-4, and that of its second component was found to be 3.8x10-4. Although we currently lack a fully general method to objectively evaluate performance in a three-class classification task, circumstantial evidence suggests that two- and three-class BANNs can provide good estimates of ideal-observer-related decision variables.
Computer-aided diagnosis has great potential to improve performance of the detection and classification of abnormalities on breast ultrasound. Our goal is to develop a computerized tool that detects suspicious areas and distinguishes among false-positive
detections, benign lesions, and malignancies. One-step classification into 3 categories is a largely unexplored territory with many possibilities.
The computerized scheme first identifies potential lesions based on expected lesion shape and margin characteristics. Our main focus here, however, is the subsequent classification of the potential lesions into 3 categories. For this purpose, we use a 3-way Bayesian neural net (BNN) based on extracted image features of the lesion candidates.
The method was tested on a database of 858 cases (1832 images) consisting of complex and simple cysts, benign solid lesions, and malignant lesions. In order to verify whether performance conforms to expectations, the output of a 3-way classifier ("A" vs. "B" vs. "C") can be projected onto that of two 2-way classifiers ("A" vs. "B or C", and "A or B" vs. "C"). We compared the projected performance of the 3-way classifier to two specifically trained 2-way classifiers. The first task was to distinguish cancer from all other lesion candidates, and the second was to distinguish actual lesions from false-positive detections. For these tasks, the performance of the 2-way classifiers and the projected performance of the 3-way classifier were indistinguishable based on calculated means and standard deviations of ROC area (Az). For example, in round robin analysis the average Az values obtained with both approaches were 0.92 and 0.83, for the two tasks, with standard deviations of 0.006 and 0.010, respectively. The potential of 3-way classification is illustrated graphically through the estimated probability density functions of the three truth categories. We have implemented a promising computerized scheme for detection and subsequent one-step 3-way classification of breast lesions on ultrasound images. The method was tested on an extensive database. The main challenge is the development of an objective evaluation method of performance such as the equivalent of ROC analysis for the 2-class problem.
We expressed the performance of the three-class "guessing" observer in terms of the six probabilities which make up a three-class receiver operating characteristic (ROC) space, in a formulation in which "sensitivities" are eliminated in constructing the ROC space (equivalent to using false-negative fraction and false-positive fraction in a two-class task). We then show that the "guessing" observer's performance in terms of these conditional probabilities is completely described by a degenerate hypersurface with only two degrees of freedom (as opposed to the five required, in general, to achieve a true hypersurface in such a ROC space). It readily follows that the hypervolume under such a degenerate hypersurface must be zero. We then consider a "near-guessing" task; that is, a task in which the three underlying data probability density functions (PDFs) are nearly identical, controlled by two parameters which may vary continuously to zero (at which point the PDFs become identical). The hypervolume under the ROC hypersurface of an observer in the three-class classification task tends continuously to zero as the underlying data PDFs converge continuously to identity (a "guessing" task). The hypervolume under the ROC hypersurface of a "perfect" ideal observer (a task in which the three data PDFs never overlap) is also found to be zero in the ROC space formulation under consideration. This suggests that hypervolume may not be a useful performance metric in three-class classification tasks, despite the utility of the area under the ROC curve for two-class tasks.
We are using Bayesian artificial neural networks (BANNs) to classify mammographic masses. We investigated whether a BANN can estimate ideal observer decision variables to distinguish malignant, benign, and false-positive computer detections. Five features were calculated for 143 malignant and 125 benign mass lesions, and for 1049 false-positive computer detections, in 596 mammograms randomly divided into a training and testing set. A BANN was trained on the training set features and applied to the testing set features. We then used a known relation between three-class ideal observer decision variables and that used by a two-class ideal observer when two of three classes are grouped into one class, giving one decision variable for distinguishing malignant from non-malignant detections, and a second for distinguishing true-positive from false-positive computer detections. For comparison, we pooled the training data into two classes in the same two ways and trained two-class BANNs for these two tasks. The three-class BANN decision variables were essentially identical in performance to the specifically trained two-class BANNS. This is consistent with the theoretical observation that three-class ideal observer decision variables are directly related to those used by a two-class ideal observer.
KEYWORDS: Error analysis, Probability theory, Statistical analysis, Artificial neural networks, Neural networks, Computer aided diagnosis and therapy, Data modeling, Signal detection, Mammography, Medical imaging
We are using Bayesian artificial neural networks (BANNs) to eliminate false-positive detections in our computer-aided diagnosis schemes. In the present work, we investigated whether BANNs can be used to estimate likelihood ratio, or ideal observer, decision functions for distinguishing observations which are drawn from three classes. Three univariate normal distributions were chosen representing three classes. We sampled 3,000 values of x for each of 10 training datasets, and 3,000 values of x for a single testing dataset. A BANN was trained on each training dataset, and the two outputs from each trained BANN, which estimate p(class 1x) and p(class 2x), were recorded for each value of x in the testing dataset. The mean BANN output and its standard error were calculated using the ten sets of BANN output. We repeated the above procedure to estimate the means and standard errors of the two likelihood ratio decision functions p(xclass 1)/p(xclass 3)/p(xclass 2)/p(xclass 3). We found that the BANN can estimate the a posteriori class probabilities quite accurately, except in regions of data space where outcomes are unlikely. Estimation of the likelihood ratios is more problematic, which we attribute to error amplification caused by taking the ratio of two imprecise estimates. We hope to improve these estimates by constraining the BANN training procedure.
We have applied a Bayesian Neural network (BNN) to the task of distinguishing between true-positive (TP) and false- positive (FP) detected clusters in a computer-aided diagnosis (CAD) scheme for detecting clustered microcalcifications in mammograms. Because BNNs can approximate ideal observer decision functions given sufficient training data, this approach should have better performance than our previous FP cluster elimination methods. Eight cluster-based features were extracted from the TP and FP clusters detected by the scheme in a training dataset of 39 mammograms. This set of features was used to train a BNN with eight input nodes, five hidden nodes, and one output node. The trained BNN was tested on the TP and FP clusters and detected by our scheme in an independent testing set of 50 mammograms. The BNN output was analyzed using ROC and FROC analysis. The detection scheme with BNN for FP cluster elimination had substantially better cluster sensitivity at low FP rates (below 0.8 FP clusters per image) than the original detection scheme without the BNN. Our preliminary research shows that a BNN can improve the performance of our scheme for detecting clusters of microcalcifications.
We extend a method for linear template estimation developed by Abbey et al. which demonstrated that a linear observer template can be estimated effectively through a two- alternative forced choice (2AFC) experiment, assuming the noise in the images is Gaussian, or multivariate normal (MVN). We relax this assumption, allowing the noise in the images to be drawn from a weighted sum of MVN distributions, which we call a multi-peaked MVN (MPMVN) distribution. Our motivation is that more complicated probability density functions might be approximated in general by such MPMVN distributions. Our extension of Abbey et al.'s method requires us to impose the additional constraint that the covariance matrices of the component peaks of the signal-present noise distribution all be equal, and that the covariance matrices of the component peaks of the signal-absent noise distribution all be equal (but different in general from the signal-present covariance matrices). Preliminary research shows that our generalized method is capable of producing unbiased estimates of linear observer templates in the presence of MPMVN noise under the stated assumptions. We believe this extension represents a next step toward the general treatment of arbitrary image noise distributions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.