1

I am running into a roadblock in my survival analysis; I think it has to do with censoring type. Here is the first 30 lines of my survival data. tstart is when a patient is admitted and starts receiving the Intervention, tstop is either death (status = 1) or discharge (censored, status = 0):

    tstart tstop status Intervention
1        2    14      0        FALSE
2        2     5      0        FALSE
3        2    10      1        FALSE
4        5     8      0        FALSE
5        6    10      0        FALSE
6        6    10      0        FALSE
7        7    10      0        FALSE
8        8    20      1         TRUE
9        8    25      0        FALSE
10       8    18      0        FALSE
11       8    11      0        FALSE
12       8     9      0        FALSE
13       9    11      0        FALSE
14       9    52      0         TRUE
15       9    26      1        FALSE
16      10    20      1         TRUE
17      10    14      0        FALSE
18      10    14      0        FALSE
19      10    11      0        FALSE
20      10    23      0         TRUE
21      10    26      0         TRUE
22      10    16      0        FALSE
23      11    21      0         TRUE
24      11    96      0         TRUE
25      11    14      0        FALSE
26      11    16      0         TRUE
27      11    14      0        FALSE
28      11    16      0        FALSE
29      11    16      0        FALSE
30      11    38      1         TRUE

Depending on how I enter this data into the coxph function, I get two different results. Namely:

# METHOD ONE:
> coxph (Surv (time = (tstop - tstart), event = status) ~ Intervention, data = df.use)
Call:
coxph(formula = Surv(time = (tstop - tstart), event = status) ~ 
    Intervention, data = df.use)

                     coef exp(coef) se(coef)      z     p
InterventionTRUE -0.05975   0.94200  0.04727 -1.264 0.206

Likelihood ratio test=1.58  on 1 df, p=0.2084
n= 7362, number of events= 2364 

# METHOD TWO:
> coxph (Surv (time = tstart, time2 = tstop, event = status) ~ Intervention, data = df.use)
Call:
coxph(formula = Surv(time = tstart, time2 = tstop, event = status) ~ 
    Intervention, data = df.use)

                     coef exp(coef) se(coef)      z             p
InterventionTRUE -0.29936   0.74129  0.04902 -6.106 0.00000000102

Likelihood ratio test=35.67  on 1 df, p=0.000000002337
n= 7362, number of events= 2364 

I thought the two methods would return the same hazard ratio, but the results are extremely different. Why is this? How can it be avoided?

aparish
  • 71
  • 4
  • 1
    This is more of a stats question than a programming question. The short answer is that the two methods are 2 different things. You should only use time2 for interval censored data. Your study design implies right-censored data and therefore method 1 is correct - that's the one you should use. – Allan Cameron Sep 17 '20 at 19:22
  • Thank you, understood. As I am trying to eventually do time-dependent survival analysis (where for one time period the patient did not receive the intervention, and for the second time period the patient is receiving the intervention), I thought I had to use the second format (w/ time and time2). I will ask a new question focusing on this now. Thank you for your help. – aparish Sep 17 '20 at 20:40
  • I have asked my new question at https://stackoverflow.com/questions/63946293/how-to-do-survival-analysis-in-r-with-time-varying-exposure-to-an-intervention and I would greatly appreciate if you could read it. Thank you. – aparish Sep 17 '20 at 21:17

1 Answers1

2

I don't believe Surv(time = (tstop - tstart), event = status) is equivalent to Surv (time = tstart, time2 = tstop, event = status). The interval between time and time2 isn't the entire observation , it's the time during which death or censoring is known to have happened. So the time and time2 for all death events both equal tstop - tstart.

The interval is used when you don't know exactly what the time to death or censoring is, but you know it is between two values.

SmokeyShakers
  • 3,372
  • 1
  • 7
  • 18
  • Thank you, I understand. As I am trying to eventually do time-dependent survival analysis (where for one time period the patient did not receive the intervention, and for the second time period the patient is receiving the intervention), I thought I had to use the second format (w/ time and time2). I will ask a new question focusing on this now. Thank you for your help. – aparish Sep 17 '20 at 20:41
  • I have asked my new question at https://stackoverflow.com/questions/63946293/how-to-do-survival-analysis-in-r-with-time-varying-exposure-to-an-intervention and I would greatly appreciate if you could read it. Thank you. – aparish Sep 17 '20 at 21:17