www.statsdirect.com Open in urlscan Pro
217.146.97.44  Public Scan

Submitted URL: http://www.statsdirect.com/help/survival_analysis/cox_regression.htm
Effective URL: https://www.statsdirect.com/help/survival_analysis/cox_regression.htm
Submission: On April 21 via api from US — Scanned from GB

Form analysis 0 forms found in the DOM

Text Content

Open topic with navigation


COX (PROPORTIONAL HAZARDS) REGRESSION

 

Menu location: Analysis_Survival_Cox Regression.

 

This function fits Cox's proportional hazards model for survival-time
(time-to-event) outcomes on one or more predictors.

 

Cox regression (or proportional hazards regression) is method for investigating
the effect of several variables upon the time a specified event takes to happen.
In the context of an outcome such as death this is known as Cox regression for
survival analysis. The method does not assume any particular "survival model"
but it is not truly nonparametric because it does assume that the effects of the
predictor variables upon survival are constant over time and are additive in one
scale. You should not use Cox regression without the guidance of a Statistician.

 

Provided that the assumptions of Cox regression are met, this function will
provide better estimates of survival probabilities and cumulative hazard than
those provided by the Kaplan-Meier function.

 

Hazard and hazard-ratios

Cumulative hazard at a time t is the risk of dying between time 0 and time t,
and the survivor function at time t is the probability of surviving to time t
(see also Kaplan-Meier estimates).

 

The coefficients in a Cox regression relate to hazard; a positive coefficient
indicates a worse prognosis and a negative coefficient indicates a protective
effect of the variable with which it is associated.

 

The hazards ratio associated with a predictor variable is given by the exponent
of its coefficient; this is given with a confidence interval under the
"coefficient details" option in StatsDirect. The hazards ratio may also be
thought of as the relative death rate, see Armitage and Berry (1994). The
interpretation of the hazards ratio depends upon the measurement scale of the
predictor variable in question, see Sahai and Kurshid (1996) for further
information on relative risk of hazards.

 

Time-dependent and fixed covariates

In prospective studies, when individuals are followed over time, the values of
covariates may change with time. Covariates can thus be divided into fixed and
time-dependent. A covariate is time dependent if the difference between its
values for two different subjects changes with time; e.g. serum cholesterol. A
covariate is fixed if its values can not change with time, e.g. sex or race.
Lifestyle factors and physiological measurements such as blood pressure are
usually time-dependent. Cumulative exposures such as smoking are also
time-dependent but are often forced into an imprecise dichotomy, i.e. "exposed"
vs. "not-exposed" instead of the more meaningful "time of exposure". There are
no hard and fast rules about the handling of time dependent covariates. If you
are considering using Cox regression you should seek the help of a Statistician,
preferably at the design stage of the investigation.

 

Model analysis and deviance

A test of the overall statistical significance of the model is given under the
"model analysis" option. Here the likelihood chi-square statistic is calculated
by comparing the deviance (- 2 * log likelihood) of your model, with all of the
covariates you have specified, against the model with all covariates dropped.
The individual contribution of covariates to the model can be assessed from the
significance test given with each coefficient in the main output; this assumes a
reasonably large sample size.

 

Deviance is minus twice the log of the likelihood ratio for models fitted by
maximum likelihood (Hosmer and Lemeshow, 1989 and 1999; Cox and Snell, 1989;
Pregibon, 1981). The value of adding a parameter to a Cox model is tested by
subtracting the deviance of the model with the new parameter from the deviance
of the model without the new parameter, the difference is then tested against a
chi-square distribution with degrees of freedom equal to the difference between
the degrees of freedom of the old and new models. The model analysis option
tests the model you specify against a model with only one parameter, the
intercept; this tests the combined value of the specified predictors/covariates
in the model.

 

Some statistical packages offer stepwise Cox regression that performs systematic
tests for different combinations of predictors/covariates. Automatic model
building procedures such as these can be misleading as they do not consider the
real-world importance of each predictor, for this reason StatsDirect does not
include stepwise selection.

 

Survival and cumulative hazard rates

The survival/survivorship function and the cumulative hazard function (as
discussed under Kaplan-Meier) are calculated relative to the baseline (lowest
value of covariates) at each time point. Cox regression provides a better
estimate of these functions than the Kaplan-Meier method when the assumptions of
the Cox model are met and the fit of the model is strong.

 

You are given the option to 'centre continuous covariates' – this makes survival
and hazard functions relative to the mean of continuous variables rather than
relative to the minimum, which is usually the most meaningful comparison.

 

If you have binary/dichotomous predictors in your model you are given the option
to calculate survival and cumulative hazards for each variable separately.

 

Data preparation

 * Time-to-event, e.g. time a subject in a trial survived.
   
 * Event / censor code - this must be ≥1 (event(s) happened) or 0 (no event at
   the end of the study, i.e. "right censored").
   
 * Strata - e.g. centre code for a multi-centre trial. Be careful with your
   choice of strata; seek the advice of a Statistician.
   
 * Predictors - these are also referred to as covariates, which can be a number
   of variables that are thought to be related to the event under study. If a
   predictor is a classifier variable with more than two classes (i.e. ordinal
   or nominal) then you must first use the dummy variable function to convert it
   to a series of binary classes.

 

Technical validation

StatsDirect optimises the log likelihood associated with a Cox regression model
until the change in log likelihood with iterations is less than the accuracy
that you specify in the dialog box that is displayed just before the calculation
takes place (Lawless, 1982; Kalbfleisch and Prentice, 1980; Harris, 1991; Cox
and Oakes, 1984; Le, 1997; Hosmer and Lemeshow, 1999).

 

The calculation options dialog box sets a value (default is 10000) for
"SPLITTING RATIO"; this is the ratio in proportionality constant at a time t
above which StatsDirect will split your data into more strata and calculate an
extended likelihood solution, see Bryson and Johnson, (1981).

 

Ties are handled by Breslow's approximation (Breslow, 1974).

 

Cox-Snell residuals are calculated as specified by Cox and Oakes (1984).
Cox-Snell, Martingale and deviance residuals are calculated as specified by
Collett (1994).

 

Baseline survival and cumulative hazard rates are calculated at each time.
Maximum likelihood methods are used, which are iterative when there is more than
one death/event at an observed time (Kalbfleisch and Prentice, 1973). Other
software may use the less precise Breslow estimates for these functions.

 

Example

From Armitage and Berry (1994, p. 479).

Test workbook (Survival worksheet: Stage Group, Time, Censor).

 

The following data represent the survival in days since entry to the trial of
patients with diffuse histiocytic lymphoma. Two different groups of patients,
those with stage III and those with stage IV disease, are compared.

 

Stage 3: 6, 19, 32, 42, 42, 43*, 94, 126*, 169*, 207, 211*, 227*, 253, 255*,
270*, 310*, 316*, 335*, 346*

 

Stage 4: 4, 6, 10, 11, 11, 11, 13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31,
33, 34, 35, 39, 40, 41*, 43*, 45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89,
90, 93, 104, 110, 134, 137, 160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*,
260*, 284*, 290*, 291*, 302*, 304*, 341*, 345*

 

* = censored data (patient still alive or died from an unrelated cause)

 

To analyse these data in StatsDirect you must first prepare them in three
workbook columns as shown below:

 

Stage group Time Censor 1 6 1 1 19 1 1 32 1 1 42 1 1 42 1 1 43 0 1 94 1 1 126 0
1 169 0 1 207 1 1 211 0 1 227 0 1 253 1 1 255 0 1 270 0 1 310 0 1 316 0 1 335 0
1 346 0 2 4 1 2 6 1 2 10 1 2 11 1 2 11 1 2 11 1 2 13 1 2 17 1 2 20 1 2 20 1 2 21
1 2 22 1 2 24 1 2 24 1 2 29 1 2 30 1 2 30 1 2 31 1 2 33 1 2 34 1 2 35 1 2 39 1 2
40 1 2 41 0 2 43 0 2 45 1 2 46 1 2 50 1 2 56 1 2 61 0 2 61 0 2 63 1 2 68 1 2 82
1 2 85 1 2 88 1 2 89 1 2 90 1 2 93 1 2 104 1 2 110 1 2 134 1 2 137 1 2 160 0 2
169 1 2 171 1 2 173 1 2 175 1 2 184 1 2 201 1 2 222 1 2 235 0 2 247 0 2 260 0 2
284 0 2 290 0 2 291 0 2 302 0 2 304 0 2 341 0 2 345 0

 

Alternatively, open the test workbook using the file open function of the file
menu. Then select Cox regression from the survival analysis section of the
analysis menu. Select the column marked "Time" when asked for the times, select
"Censor" when asked for death/ censorship, click on the cancel button when asked
about strata and when asked about predictors and select the column marked "Stage
group".

 

For this example:

 

Cox (proportional hazards) regression

80 subjects with 54 events

Deviance (likelihood ratio) chi-square = 7.634383 df = 1 P = 0.0057

 

Stage group b1 = 0.96102 z = 2.492043 P = 0.0127

 

Cox regression - hazard ratios

 

Parameter Hazard ratio 95% CI Stage group 2.614362 1.227756 to 5.566976      
Parameter Coefficient Standard Error Stage group 0.96102 0.385636

 

Cox regression - model analysis

Log likelihood with no covariates = -207.554801

Log likelihood with all model covariates = -203.737609

Deviance (likelihood ratio) chi-square = 7.634383 df = 1 P = 0.0057

 

The significance test for the coefficient b1 tests the null hypothesis that it
equals zero and thus that its exponent equals one. The confidence interval for
exp(b1) is therefore the confidence interval for the relative death rate or
hazard ratio; we may therefore infer with 95% confidence that the death rate
from stage 4 cancers is approximately 3 times, and at least 1.2 times, the risk
from stage 3 cancers.



 

 

Copyright © 2000-2022 StatsDirect Limited, all rights reserved. Download a free
trial here.