2.5.2.2 Algorithm: Nonparametric Distribution Analysis (Arbitrary Censoring)
The Uncensor/Arbitrary Censor data are represented as time intervals \((tl_i, tr_i)\):
- \(tl_i\): lower bound (time of last inspection or last known survival)
- \(tr_i\): upper bound (time when failure was first detected)
- If \(tr_i = \infty\), it represents right-censoring.
- If \(tl_i = tr_i\), it represents exact failure (uncensored).
Contents
Turnbull estimation method
Time intervals and the probabilities of each interval are calculated by the Turbull estimation[1]. Turnbull developed an iterative algorithm to obtain the nonparametric maximum likelihood estimate (NPMLE) of the cumulative distribution function for censored data. This approach is applicable to more general cases, including situations where the observation intervals overlap.
The covariance of the constrained MLE is computed via the Observed Fisher information matrix in the reduced parameter space[2] with the constraint \(\sum p_j=1\).
Actuarial estimation method
First, a clinical life table is constructed, with each interval \([t_i,t_{i+1})\) representing the range into which survival times and times to loss or withdrawal are distributed. Each interval spans from \(t_i\) up to, but not including \(t_{i+1} \) for \(i=1,2,...,s\). The final interval extends to infinity. These intervals are considered fixed. Using the input censoring information, we can tabulate the basic data required for the calculation.
- \(t_{mi}\) The midpoint of the \(i\)th interval.
- \( b_i=t_{i+1}-t_i \) The width of the \(i\)th interval.
- \(l_i\) Number lost or withdrawn alive in the \(i\)th interval.
- \(d_i\) Number die in the \(i\)th interval.
- \( n_i'=n_{i-1}' - l_{i-1} - d_{i-1} \) Number entering the ith interval
- \( n_i=n_i' - \frac{1}{2}l_{i} \) Number exposed to risk in the \(i\)th interval.
Actuarial Table
Conditional Probability of Failure
- \[ \hat{q}_{i}=d_i/n_i \]
Varaince of Conditional Probability of Failure
- \[ Var[\hat{q}_{i}]=\frac{\hat{q}_{i}(1-\hat{q}_{i})}{n_i} \]
Survival Probabilities
Cumulative Survival Probabilities
- \[\hat{S}(t_i)= \begin{cases} 1 & \text{if } i = 1, \\ \hat{p}_{i-1}\hat{S}(t_{i-1}) & \text{if } i = 2,...,k \end{cases} \]
Variance of Cumulative Survival Probabilities
- \[Var[\hat{S}(t_i)] = [\hat{S}(t_i)]^2\sum_{j=1}^{i-1}\frac{\hat{q}_j}{n_j\hat{p}_j} \]
Hazards and Densities
Hazard Estimates
- \[\hat{h}(t_{mi})=\frac{2\hat{q}_i}{b_i(1+\hat{p}_i)}\]
Variance of Hazards
- \[Var[\hat{h}(t_{mi})] = \frac{(\hat{h}(t_{mi}))^2}{n_i\hat{q}_i}(1-(\frac{1}{2}\hat{h}(t_{mi})b_i)^2)\]
Probability Density Estimates
- \[\hat{f}(t_{mi})=\frac{\hat{S}(t_i)\hat{q}_i}{b_i}\]
Variance of Probability Densities
- \[Var[\hat{f}(t_{mi})] = \frac{(\hat{S}(t_i)\hat{q}_i)^2}{b_i}\sum_{j=1}^{i-1}(\frac{\hat{q}_j}{n_j\hat{p}_j} + \frac{\hat{p}_j}{n_j\hat{q}_j})\]
Characteristics of Variable
Median
- \[ \hat{t}_{m}=t_j + \frac{\hat{S}(t_j) - 0.5}{\hat{f}(t_{mj})} \]
Variance of Median
- \[ Var[\hat{t}_{m}] = \frac{\hat{S}(t_0)^2}{4n_0\hat{f}(t_{mj})} \]
Additional Time from Time T until Half of Units Fail
First find time interval \((t_j, t_{j+1})\) so that \(\hat{S}(t_j)\le\frac{1}{2}\hat{S}(t_i)\) and \(\hat{S}(t_{j+1}) > \frac{1}{2}\hat{S}(t_i)\). Then
- \[\hat{t}_{mr}(i) = (t_j-t_i)+\frac{b_j(\hat{S}(t_j)-1/2\hat{S}(t_i))}{\hat{S}(t_j)-\hat{S}(t_{j+1})}\]
Variance of \(\hat{t}_{mr}(i)\)
- \[Var[\hat{t}_{mr}(i)] = \frac{(\hat{S}(t_i))^2}{4n_i(\hat{f}(t_{mj}))^2}\]
Confidence Interval
The confidence intervals are calculated using a normal approximation:
- Two-sided \(100(1-\alpha)\%\) confidence interval
- \[x_{L} = \hat{x} - z_{1-\alpha/2}\times SE \]
- \[x_{U} = \hat{x} + z_{1-\alpha/2}\times SE\]
- One-sided \(100(1-\alpha)\%\) lower confidence bound
- \[x_{L} = \hat{x} - z_{1-\alpha}\times SE\]
- One-sided \(100(1-\alpha)\%\) upper confidence bound
- \[x_{U} = \hat{x} + z_{1-\alpha}\times SE\]
- Two-sided \(100(1-\alpha)\%\) confidence interval
Reference
- B.W. Turnbull (1976). "The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data". Journal of the Royal Statistical Society, 38: pp. 290-295.
- A. P. Dawid (1979). "Conditional independence in statistical theory." Journal of the Royal Statistical Society, Series B 41(1):1–31.