17.10.2 Algorithms (ROC Curve)

Contents

In this part, Following notation will be used.

\(x_i\,\!\) : Test result score for case

\(n_{TP}\,\!\) : Number of true positive decisions

\(n_{FN}\,\!\) : Number of false negative decisions

\(n_{TN}\,\!\) : Number of true negative decisions

\(n_{FP}\,\!\) : Number of false positive decisions

\(n_{-}\,\!\): Number of cases with negative actual state

\(n_{+}\,\!\): Number of cases with positive actual state

\(n_{-=j}\,\!\): Number of true negative cases with test results equal to

\(n_{+>j}\,\!\): : Number of true positive cases with test results greater than

\(n_{+=j}\,\!\): : Number of true positive cases with test results equal to

\(n_{-<j}\,\!\): : Number of true negative cases with test results less than


ROC Values

1- Specificity (X): \(1-\frac{n_{TN}}{n_{TN}+n_{FP}}\,\!\)

Sensitivity (Y):\(\frac{n_{TP}}{n_{TP}+n_{FN}}\,\!\)

The area under the ROC curve

Let \(x\,\!\) be the scale of the test result variable. Denote \(x_{-}\,\!\) by the \(x\,\!\) values for cases with negative actual states and \(x_{+}\,\!\) the values for cases with positive actual states. Then, the nonparametric approximation of the &rdquor;true” area under the ROC curve, \(\theta \,\!\),is

\( A_Z=\frac 1{n_{+}n_{-}}\)\(\sum_{j=1}^{n_{-}}\sum _{i=1}^{n_{+}}\Psi (x_{+},x_{-})\)

where \(n_{+}\,\!\) is the sample size of \(D\,\!\)+, \(n_{+}\,\!\)is the sample size of \(D\,\!\)-, and

\(\Psi (x_{+},x_{-})=\,\!\) \( \begin{cases} 1, & \mbox{if }x_{+}>x_{-} \\ 0.5, & \mbox{if }x_{+}=x_{-} \\ 0, & \mbox{if }x_{+}<x_{-} \end{cases}\)

Note that \(A_z\,\!\) is the observed area under the ROC curve, which connects successive points by a straight line, i.e., by the trapezoidal rule.

An alternative way to compute \(A_z\,\!\) is as follows:

\[A_Z=\frac 1{n_{+}+n_{-}}\sum \left\{ n_{-=j}n_{+>j}+\frac{n_{-=j}n_{+=j}}2\right\} \]

The SE of the area under the ROC curve statistic

The standard deviation of \(A_z\,\!\) is estimated by:

\[SE(A_Z)=\sqrt{\frac{A_Z(1-A_Z)+(n_{+}-1)(Q_1-A_Z^2)+(n_{-}-1)(Q_2-A_Z^2)}{n_{+}n_{-}}} \,\!\]

where

\[Q_{1=\frac 1{n_{-}n_{+}^2}}\sum n\__{=j}[n_{+>j}^2+n_{+>j}n_{+=j}+\frac{n_{+>j}^2}3] \,\!\]

and

\[Q_{2=\frac 1{n_{-}^2n_{+}}}\sum n_{+=j}[n_{->j}^2+n_{->j}n_{-=j}+\frac{n_{-=j}^2}3] \,\!\]

The asymptotic confidence interval of the area under the ROC curve

A 2-sided asymptotic \(c\%=(100-\alpha )\%\,\!\) confidence interval for the true area under the ROC curve is

\[A_Z\pm SE(A_Z)\,\!\]

The asymptotic P-value under the null hypothesis that \[ \theta=0.5\ \,\!\] vs. the alternative hypothesis that \( \theta \neq 0.5\ \,\!\)

Since \(A_z\,\!\) is asymptotically normal under the null hypothesis that \( \theta=0.5\ \,\!\) , we can calculate the asymptotic P-value under the null hypothesis that \( \theta=0.5\ \,\!\) vs. the alternative hypothesis that \( \theta \neq 0.5\ \,\!\):

\[P\left( \left| Z\right| >\left| \frac{A_Z-0.5}{SD(A_Z)|_{\theta =0.5}}\right| \right) =2P\left( Z>\left| \frac{A_Z-0.5}{SD(A_Z)\mid _{\theta =0.5}}\right| \right) \]

In the nonparametric case,

\[SD(A_Z)|_{\theta =0.5}=\sqrt{\frac{\theta (1-\theta )+(n_{+}-1)(Q_1-\theta ^2)+(n_{-}-1)(Q_2-\theta ^2)}{n_{+}n_{-}}}|_{\theta =0.5}\,\!\]

\[=\sqrt{\frac{0.5(1-0.5)+(n_{+}-1)(\frac 13-0.5^2)+(n_{-}-1)(\frac 13-0.5^2)}{n_{+}n_{-}}} \]

Optimal Cut-Point Value

The cut-point value is defined by the equality maximization of these two quantities (SpEqualSe), which is min( abs(1-x-y) ) for ROC curve.