2.5.9.2 Algorithm: Nonparametric Distribution Analysis (Arbitrary Censoring)

The Uncensor/Arbitrary Censor data are represented as time intervals \((tl_i, tr_i)\):

\(tl_i\): lower bound (time of last inspection or last known survival)
\(tr_i\): upper bound (time when failure was first detected)
If \(tr_i = \infty\), it represents right-censoring.
If \(tl_i = tr_i\), it represents exact failure (uncensored).

1 Turnbull estimation method
2 Actuarial estimation method
3 Confidence Interval
4 Reference

Turnbull estimation method

Time intervals and the probabilities of each interval are calculated by the Turbull estimation[1]. Turnbull developed an iterative algorithm to obtain the nonparametric maximum likelihood estimate (NPMLE) of the cumulative distribution function for censored data. This approach is applicable to more general cases, including situations where the observation intervals overlap.

The covariance of the constrained MLE is computed via the Observed Fisher information matrix in the reduced parameter space[2] with the constraint \(\sum p_j=1\).

Actuarial estimation method

First, a clinical life table is constructed, with each interval \([t_i,t_{i+1})\) representing the range into which survival times and times to loss or withdrawal are distributed. Each interval spans from \(t_i\) up to, but not including \(t_{i+1} \) for \(i=1,2,...,s\). The final interval extends to infinity. These intervals are considered fixed. Using the input censoring information, we can tabulate the basic data required for the calculation.

\(t_{mi}\) The midpoint of the \(i\)th interval.
\( b_i=t_{i+1}-t_i \) The width of the \(i\)th interval.
\(l_i\) Number lost or withdrawn alive in the \(i\)th interval.
\(d_i\) Number die in the \(i\)th interval.
\( n_i'=n_{i-1}' - l_{i-1} - d_{i-1} \) Number entering the ith interval
\( n_i=n_i' - \frac{1}{2}l_{i} \) Number exposed to risk in the \(i\)th interval.

Actuarial Table

Conditional Probability of Failure

\[ \hat{q}_{i}=d_i/n_i \]

Varaince of Conditional Probability of Failure

\[ Var[\hat{q}_{i}]=\frac{\hat{q}_{i}(1-\hat{q}_{i})}{n_i} \]

Survival Probabilities

Cumulative Survival Probabilities

\[\hat{S}(t_i)= \begin{cases} 1 & \text{if } i = 1, \\ \hat{p}_{i-1}\hat{S}(t_{i-1}) & \text{if } i = 2,...,k \end{cases} \]

Variance of Cumulative Survival Probabilities

\[Var[\hat{S}(t_i)] = [\hat{S}(t_i)]^2\sum_{j=1}^{i-1}\frac{\hat{q}_j}{n_j\hat{p}_j} \]

Hazards and Densities

Hazard Estimates

\[\hat{h}(t_{mi})=\frac{2\hat{q}_i}{b_i(1+\hat{p}_i)}\]

Variance of Hazards

\[Var[\hat{h}(t_{mi})] = \frac{(\hat{h}(t_{mi}))^2}{n_i\hat{q}_i}(1-(\frac{1}{2}\hat{h}(t_{mi})b_i)^2)\]

Probability Density Estimates

\[\hat{f}(t_{mi})=\frac{\hat{S}(t_i)\hat{q}_i}{b_i}\]

Variance of Probability Densities

\[Var[\hat{f}(t_{mi})] = \frac{(\hat{S}(t_i)\hat{q}_i)^2}{b_i}\sum_{j=1}^{i-1}(\frac{\hat{q}_j}{n_j\hat{p}_j} + \frac{\hat{p}_j}{n_j\hat{q}_j})\]

Characteristics of Variable

Median

\[ \hat{t}_{m}=t_j + \frac{\hat{S}(t_j) - 0.5}{\hat{f}(t_{mj})} \]

Variance of Median

\[ Var[\hat{t}_{m}] = \frac{\hat{S}(t_0)^2}{4n_0\hat{f}(t_{mj})} \]

Additional Time from Time T until Half of Units Fail

First find time interval \((t_j, t_{j+1})\) so that \(\hat{S}(t_j)\le\frac{1}{2}\hat{S}(t_i)\) and \(\hat{S}(t_{j+1}) > \frac{1}{2}\hat{S}(t_i)\). Then

\[\hat{t}_{mr}(i) = (t_j-t_i)+\frac{b_j(\hat{S}(t_j)-1/2\hat{S}(t_i))}{\hat{S}(t_j)-\hat{S}(t_{j+1})}\]

Variance of \(\hat{t}_{mr}(i)\)

\[Var[\hat{t}_{mr}(i)] = \frac{(\hat{S}(t_i))^2}{4n_i(\hat{f}(t_{mj}))^2}\]

Confidence Interval

The confidence intervals are calculated using a normal approximation:

Two-sided \(100(1-\alpha)\%\) confidence interval
\[x_{L} = \hat{x} - z_{1-\alpha/2}\times SE \]

\[x_{U} = \hat{x} + z_{1-\alpha/2}\times SE\]
One-sided \(100(1-\alpha)\%\) lower confidence bound
\[x_{L} = \hat{x} - z_{1-\alpha}\times SE\]
One-sided \(100(1-\alpha)\%\) upper confidence bound
\[x_{U} = \hat{x} + z_{1-\alpha}\times SE\]

Reference

B.W. Turnbull (1976). "The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data". Journal of the Royal Statistical Society, 38: pp. 290-295.
A. P. Dawid (1979). "Conditional independence in statistical theory." Journal of the Royal Statistical Society, Series B 41(1):1–31.