17.10.2 Algorithms (ROC Curve)
Contents
In this part, Following notation will be used.
\(x_i\,\!\) : Test result score for case
\(n_{TP}\,\!\) : Number of true positive decisions
\(n_{FN}\,\!\) : Number of false negative decisions
\(n_{TN}\,\!\) : Number of true negative decisions
\(n_{FP}\,\!\) : Number of false positive decisions
\(n_{-}\,\!\): Number of cases with negative actual state
\(n_{+}\,\!\): Number of cases with positive actual state
\(n_{-=j}\,\!\): Number of true negative cases with test results equal to
\(n_{+>j}\,\!\): : Number of true positive cases with test results greater than
\(n_{+=j}\,\!\): : Number of true positive cases with test results equal to
\(n_{-<j}\,\!\): : Number of true negative cases with test results less than
ROC Values
1- Specificity (X): \(1-\frac{n_{TN}}{n_{TN}+n_{FP}}\,\!\)
Sensitivity (Y):\(\frac{n_{TP}}{n_{TP}+n_{FN}}\,\!\)
The area under the ROC curve
Let \(x\,\!\) be the scale of the test result variable. Denote \(x_{-}\,\!\) by the \(x\,\!\) values for cases with negative actual states and \(x_{+}\,\!\) the values for cases with positive actual states. Then, the nonparametric approximation of the ”true” area under the ROC curve, \(\theta \,\!\),is
\( A_Z=\frac 1{n_{+}n_{-}}\)\(\sum_{j=1}^{n_{-}}\sum _{i=1}^{n_{+}}\Psi (x_{+},x_{-})\)
where \(n_{+}\,\!\) is the sample size of \(D\,\!\)+, \(n_{+}\,\!\)is the sample size of \(D\,\!\)-, and
\(\Psi (x_{+},x_{-})=\,\!\) \( \begin{cases} 1, & \mbox{if }x_{+}>x_{-} \\ 0.5, & \mbox{if }x_{+}=x_{-} \\ 0, & \mbox{if }x_{+}<x_{-} \end{cases}\)
Note that \(A_z\,\!\) is the observed area under the ROC curve, which connects successive points by a straight line, i.e., by the trapezoidal rule.
An alternative way to compute \(A_z\,\!\) is as follows:
\[A_Z=\frac 1{n_{+}+n_{-}}\sum \left\{ n_{-=j}n_{+>j}+\frac{n_{-=j}n_{+=j}}2\right\} \]
The SE of the area under the ROC curve statistic
The standard deviation of \(A_z\,\!\) is estimated by:
\[SE(A_Z)=\sqrt{\frac{A_Z(1-A_Z)+(n_{+}-1)(Q_1-A_Z^2)+(n_{-}-1)(Q_2-A_Z^2)}{n_{+}n_{-}}} \,\!\]
where
\[Q_{1=\frac 1{n_{-}n_{+}^2}}\sum n\__{=j}[n_{+>j}^2+n_{+>j}n_{+=j}+\frac{n_{+>j}^2}3] \,\!\]
and
\[Q_{2=\frac 1{n_{-}^2n_{+}}}\sum n_{+=j}[n_{->j}^2+n_{->j}n_{-=j}+\frac{n_{-=j}^2}3] \,\!\]
The asymptotic confidence interval of the area under the ROC curve
A 2-sided asymptotic \(c\%=(100-\alpha )\%\,\!\) confidence interval for the true area under the ROC curve is
\[A_Z\pm SE(A_Z)\,\!\]
The asymptotic P-value under the null hypothesis that \[ \theta=0.5\ \,\!\] vs. the alternative hypothesis that \( \theta \neq 0.5\ \,\!\)
Since \(A_z\,\!\) is asymptotically normal under the null hypothesis that \( \theta=0.5\ \,\!\) , we can calculate the asymptotic P-value under the null hypothesis that \( \theta=0.5\ \,\!\) vs. the alternative hypothesis that \( \theta \neq 0.5\ \,\!\):
\[P\left( \left| Z\right| >\left| \frac{A_Z-0.5}{SD(A_Z)|_{\theta =0.5}}\right| \right) =2P\left( Z>\left| \frac{A_Z-0.5}{SD(A_Z)\mid _{\theta =0.5}}\right| \right) \]
In the nonparametric case,
\[SD(A_Z)|_{\theta =0.5}=\sqrt{\frac{\theta (1-\theta )+(n_{+}-1)(Q_1-\theta ^2)+(n_{-}-1)(Q_2-\theta ^2)}{n_{+}n_{-}}}|_{\theta =0.5}\,\!\]
\[=\sqrt{\frac{0.5(1-0.5)+(n_{+}-1)(\frac 13-0.5^2)+(n_{-}-1)(\frac 13-0.5^2)}{n_{+}n_{-}}} \]
Optimal Cut-Point Value
The cut-point value is defined by the equality maximization of these two quantities (SpEqualSe), which is min( abs(1-x-y) ) for ROC curve.