The Pooled ROC Curve and Jackknifing Readers
Analysis of rating-method ROC data in which data from patients and readers is pooled has a number of advantages over analysis of the data from individual readers: (1) it measures the performance of a diagnostic system rather than an average reader, (2) it is more appropriate for cost/benefit analysis, (3) it matches the commonly used visual display of pooled results.
Choice of design and statistical technique for an experiment depends upon whether statistical generalization to readers or patients is more fundamental, which in turn depends upon the nature of the experimental question. In studies of physical image characteristics, investigators often assume that there are no important differences among readers. Such studies are concerned with whether a physical manipulation influences image interpretability of a population of patients; thus the appropriate error term for testing differences is based on variation among patients. On the other hand, research designs in psychology tend to treat images within an experimental condition as a fixed factor with readers as a random variable. An experimental factor that might affect the reader’s perception of a radiograph, but does not change the radiograph, is a psychological factor. The basic problem in investigations of psychological variables is to study consistency of effect across readers so as to be able to generalize to new readers. The appropriate error term for testing differences related to psychological variables is based on reader variation. Because our research involves manipulation of the perception and/ or cognitive behavior of the reader, our application of maximum-likelihood and jackknife methods is designed to allow experimental results to be generalized to the population of radiologists from which the sample was selected.
Perceptual accuracy of the pooled ROC curve can be analyzed by the jackknife method. In previous investigations using a relatively small number of patients, conclusions derived from pseudovalues of the jackknife method agreed with conclusions derived from estimates of the maximum likelihood method (Berbaum et al. 1986; Berbaum, Franken, et al. 1988; Berbaum, El Khoury et al., 1988). When the maximum-likelihood method does not converge to a solution for individual-reader ROC curves, the most conservative approach is simply to exclude that reader’s data from further analysis. With smaller patient samples, conclusions from pseudovalues vs. maximum-likelihood estimates might diverge because of possible degenerate individual ROC data or strong statistical bias in maximum-likelihood estimates. Maximum-likelihood estimates may be biased in small samples, whereas the jackknife is a bias-reducing method of estimation.
RSCORE-J: Computation of Estimates and Standard Errors for Parameters of Signal Detection Theory Using Jackknife Method.
RSCORE-J is a pooled data version of RSCORE II. This program incorporates many parts of the RSCORE II program for rating method data. RSCORE II is a modified version of a program developed by Dorfman and Alf (1969).
RSCORE II employs a variant of the Newton-Raphson method called the method of scoring, to obtain maximum-likelihood estimates of the parameters of signal detection theory for rating-method data. In the method of scoring the expected second-partial derivatives replace the observed second-partial derivatives used in the Newton-Raphson method. The method of scoring requires a set of initial guesses or preliminary estimates of the parameters. RSCORE II does not require a set of initial estimates. RSCORE II calculates the least-squares solution for the parameter estimates from the data and uses these least-squares estimates as the initial values for the method of scoring.
With pooled data of a group of observers, it has been assumed that the pooled observers are homogeneous with regard to sensitivity and decision criteria. However, statistical inferences may not be robust with regard to mild violations of this assumption. RSCORE-J applies the jackknife technique to rating data pooled from a group of observers to obtain estimates and standard errors for parameters of a rating-method roc curve of pooled data, and to reduce statistical bias of such estimates so as to permit robust statistical inference.
Berbaum KS, Franken Jr EA, Dorfman DD, Barloon TJ, Ell SR, Lu CH, Smith WL, Abu-Yousef MM. Tentative diagnoses facilitate the detection of diverse lesions in chest radiographs. Investigative Radiology 1986;21:532-539.
Berbaum KS, Franken EA, Dorfman DD, Barloon TJ. Influence of clinical history upon detection of nodules and other lesions. Radiology 1988;23:48-55.
Berbaum KS, El Khoury GE, Franken EA, Kathol MC, Montgomery WJ, Hesson W. The impact of clinical history on detection of fractures. Radiology 1988;168:507-511.
Dorfman DD. RSCORE II. In J. A. Swets & R. M. Pickett, Evaluation of diagnostic systems: Methods from signal detection theory. New York: Academic Press, 1982:212-232.
Dorfman DD, Alf E Jr. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals-Rating method data. Journal of Mathematical Psychology 1969;6:487-496.
Dorfman DD, Berbaum KS. RSCORE-J: pooled rating-method data: a computer program for analyzing pooled ROC curves. Behavior Research Methods, Instruments, and Computers 1986;18:452-462.
RSCORE-J (2006 Version) Download
|RSCORE-J Version 2006||189 kb||Program, manual, and sample data||Download|