17.6.2.2 Algorithms (Cox Proportional Hazard Regression)

Let \(t_i\,\!\),for i = 1, 2, ?, n, be the failure time or censored time for the ith observation with the vector of p covariates \(Z_j(j=1,2,\ldots ,p)\). It is assumed that the failure and censored mechanisms are independent. The hazard function, \(\lambda (z,t)\,\!\) , is the probability that an individual with covariates z fails at time t given that the individual survived up to time t. In the Cox proportional model is of the form:

\[\lambda (z,t)=\lambda _0(t)\exp (z^{T}\beta +\omega )\,\!\]

where \(\lambda _0\,\!\) is the base-line hazard function, an unspecified function of time, \(\beta \,\!\) is a vector of unknown parameters and \(\omega\,\!\) is a known offset.

Assuming there are ties in the failure time giving \(n_d < n\,\!\) distinct failure times, \(t_{(1)} < t_{(2)} < ?< t_{(nd)}\) , such that \(d_i\,\!\) individuals fail at \(t_{(i)}\,\!\) , it follows that the marginal likelihood for \(\beta\) is well approximated by:

\[L=\prod_{i=1}^{n_d}\frac{\exp (s_i^{T}\beta +\omega _i)}{[\sum_{l\in R(t_{(1)})}\exp (z_i^{T}\beta +\omega _i)]^{d_{i}}}\]

(1)

where \(s_i\,\!\) is the sum of covariate of individuals observed fail at \(t_{(i)}\,\!\) and \(R(t_{(i)})\,\!\) is the set of individuals at risk just prior to \(t_{(i)}\,\!\) , that is it is all the individuals that fail or censored at time \(t_{(i)}\) along with all individuals survived beyond the time \(t_{(i)}\,\!\) . The MLE (maximum likelihood estimates) of \(\beta\,\!\), given by\(\hat \beta\,\!\), are obtained by maximizing (1) using a Newton-Raphson iteration technique that includes step having and utilizes the first and second partial derivatives of (1) which are given by (2) and (3) below:

\[U_j(\beta )=\frac{\partial Ln(L)}{\partial \beta _j}=\sum_{i=1}^{n_d}[s_{ji}-d_i\alpha _{ji}(\beta )]=0\]

(2)

for j = 1, 2, ?, p, where \(s_{ji}\,\!\) is the jth element in the vector \(s_i\,\!\) and

\[\alpha _{ji}(\beta )=\frac{\sum_{l\in R(t_{(1)})}z_{jl}\exp (z_l^{T}\beta +\omega _l)}{\sum_{l\in R(t_{(1)})}\exp (z_l^{T}\beta +\omega _l)}\]

Similarly,

\[I_{hj}(\beta )=-\frac{\partial ^2Ln(L)}{\partial \beta _h\partial \beta _j}=\sum_{i=1}^{n_d}d_i\gamma _{hji}\]

(3)

where \(\gamma _{hji}=\frac{\sum_{l\in R(t_{(1)})}z_{hl}z_{jl}\exp (z_l^{T}\beta +\omega _l)}{\sum_{l\in R(t_{(1)})}\exp (z_l^{T}\beta +\omega _l)}-\alpha _{hi}(\beta )\alpha _{ji}(\beta )\) h, j = 1, ? p.

\(U_j(\beta )\,\!\) is the jth component of a score vector \(I_{hi}(\beta )\,\!\) is the (h, j) element of the observed information matrix \(I(\beta )\,\!\) whose inverse \(I(\beta )^{-1}=I_{hi}(\beta )^{-1}\,\!\) gives the variance-covariance matrix of \(\beta\,\!\).

It should be noted that if a covariate or a linear combination of covariates is monotonically increasing or decreasing with time ,then one or more of the \(\beta _j^{\prime }s\) will be infinite.

If \(\lambda _0(t)\,\!\) varies across \(\nu\,\!\) strata, where the number of individuals in the kth stratum is \(n_k\,\!\), k = 1, ?, \(\nu\,\!\), with \(n=\sum_{k=1}^\nu n_k\) , then rather than maximizing (1) to obtain \(\hat \beta\,\!\), the following marginal likelihood is maximized:

\[L=\prod_{k=1}^\nu L_k\]

(4)

where \(L_k\,\!\) is the contribution to likelihood for the \(n_k\,\!\) observations in the kth stratum treated as a single sample in (1). When strata are concluded the covariate coefficients are constant across strata but there is a different base-line hazard function \(\lambda _0(t)\,\!\).

The base-line survival function associated with a failure time \(t_{(i)}\,\!\) , is estimated as

\(exp(-\hat H(t_{(i)}))\) ,

where \(\hat H(t_{(i)})=\sum_{t(j)\leq t(i)}(\frac{d_i}{\sum_{l\in R(t_{(j)})}\exp (z_l^T\hat \beta +\omega _l)})\)

and \(d_i\,\!\) is the number of failures at time \(t_{(i)}\,\!\) . The residual of the lth observation is computed as:

\[r(t_l)=\hat H(t_l)\exp (-z_l^T\hat \beta +\omega _l)\]

where \(\hat H(t_l)=\hat H(t_{(i)}),t_{(i)}\leq t_l<t_{(i+1)}\) .

The deviance is defined as \(-2^*\,\!\) (logarithm of marginal likelihood). There are two ways to test whether individual covariates are significant: the differences between the deviances of nested models can be compared with appropriate \(\chi ^2\,\!\)-distribution; or, the asymptotic normality of the parameter estimates can be used to form z-test by dividing the estimates by their standard errors or the score function for the model under the null hypothesis can be used to form z-test.