Two different results from coxph in R, using same stop and start times, why?

Question

I am running into a roadblock in my survival analysis; I think it has to do with censoring type. Here is the first 30 lines of my survival data. tstart is when a patient is admitted and starts receiving the Intervention, tstop is either death (status = 1) or discharge (censored, status = 0):

    tstart tstop status Intervention
1        2    14      0        FALSE
2        2     5      0        FALSE
3        2    10      1        FALSE
4        5     8      0        FALSE
5        6    10      0        FALSE
6        6    10      0        FALSE
7        7    10      0        FALSE
8        8    20      1         TRUE
9        8    25      0        FALSE
10       8    18      0        FALSE
11       8    11      0        FALSE
12       8     9      0        FALSE
13       9    11      0        FALSE
14       9    52      0         TRUE
15       9    26      1        FALSE
16      10    20      1         TRUE
17      10    14      0        FALSE
18      10    14      0        FALSE
19      10    11      0        FALSE
20      10    23      0         TRUE
21      10    26      0         TRUE
22      10    16      0        FALSE
23      11    21      0         TRUE
24      11    96      0         TRUE
25      11    14      0        FALSE
26      11    16      0         TRUE
27      11    14      0        FALSE
28      11    16      0        FALSE
29      11    16      0        FALSE
30      11    38      1         TRUE

Depending on how I enter this data into the coxph function, I get two different results. Namely:

# METHOD ONE:
> coxph (Surv (time = (tstop - tstart), event = status) ~ Intervention, data = df.use)
Call:
coxph(formula = Surv(time = (tstop - tstart), event = status) ~ 
    Intervention, data = df.use)

                     coef exp(coef) se(coef)      z     p
InterventionTRUE -0.05975   0.94200  0.04727 -1.264 0.206

Likelihood ratio test=1.58  on 1 df, p=0.2084
n= 7362, number of events= 2364 

# METHOD TWO:
> coxph (Surv (time = tstart, time2 = tstop, event = status) ~ Intervention, data = df.use)
Call:
coxph(formula = Surv(time = tstart, time2 = tstop, event = status) ~ 
    Intervention, data = df.use)

                     coef exp(coef) se(coef)      z             p
InterventionTRUE -0.29936   0.74129  0.04902 -6.106 0.00000000102

Likelihood ratio test=35.67  on 1 df, p=0.000000002337
n= 7362, number of events= 2364

I thought the two methods would return the same hazard ratio, but the results are extremely different. Why is this? How can it be avoided?

This is more of a stats question than a programming question. The short answer is that the two methods are 2 different things. You should only use time2 for interval censored data. Your study design implies right-censored data and therefore method 1 is correct - that's the one you should use. — Allan Cameron, Sep 17 '20 at 19:22
Thank you, understood. As I am trying to eventually do time-dependent survival analysis (where for one time period the patient did not receive the intervention, and for the second time period the patient is receiving the intervention), I thought I had to use the second format (w/ time and time2). I will ask a new question focusing on this now. Thank you for your help. — aparish, Sep 17 '20 at 20:40
I have asked my new question at https://stackoverflow.com/questions/63946293/how-to-do-survival-analysis-in-r-with-time-varying-exposure-to-an-intervention and I would greatly appreciate if you could read it. Thank you. — aparish, Sep 17 '20 at 21:17

score 2 · Answer 1 · answered Sep 17 '20 at 19:19

2

I don't believe Surv(time = (tstop - tstart), event = status) is equivalent to Surv (time = tstart, time2 = tstop, event = status). The interval between time and time2 isn't the entire observation , it's the time during which death or censoring is known to have happened. So the time and time2 for all death events both equal tstop - tstart.

The interval is used when you don't know exactly what the time to death or censoring is, but you know it is between two values.

answered Sep 17 '20 at 19:19

SmokeyShakers

3,372
1
7
18

Thank you, I understand. As I am trying to eventually do time-dependent survival analysis (where for one time period the patient did not receive the intervention, and for the second time period the patient is receiving the intervention), I thought I had to use the second format (w/ time and time2). I will ask a new question focusing on this now. Thank you for your help. – aparish Sep 17 '20 at 20:41
I have asked my new question at https://stackoverflow.com/questions/63946293/how-to-do-survival-analysis-in-r-with-time-varying-exposure-to-an-intervention and I would greatly appreciate if you could read it. Thank you. – aparish Sep 17 '20 at 21:17

Two different results from coxph in R, using same stop and start times, why?

1 Answers1

Linked