0

I'm trying to use pyears to estimate incidence in a cohort where one of my covariates of interest is age at time of event (rather than age at enrolment, i.e. enrolment cohort). Age at event is of course time-dependent. The correct way to do this appears to be using tcut on age at enrolment as illustrated in the help for pyears. However, it appears to only work when the start time is always zero (or you use the equivalent approach of providing a Surv object with follow-up time rather than start/end times). For my scenario, it is important to use the actual start/end times because I also want to use other time-varying covariates like calendar year.

Here is an example to illustrate the problem:

library(tidyverse)
library(survival)

# encode actual start/end dates
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
             etime = stime + 365.25,
             futime = etime - stime,
             outcome = c(1,1,1,0,0,0,0,0,0,0),
             age.enr = floor(runif(10, 15, 64.999)))

# encode time elapsed from origin of zero
s2 <- tibble(stime = 0,
             etime = stime + 365.25,
             futime = etime - stime,
             outcome = c(1,1,1,0,0,0,0,0,0,0),
             age.enr = floor(runif(10, 15, 64.999)))

# these ought to give the same results, but don't (the second one appears to be right)
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears

# test it with a dataset where start time is always zero - works
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears

This results in:

> # these ought to give the same results, but don't (the second one appears to be right)
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
 0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
       0.00        0.00        0.00        0.00      365.25 
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
 0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
       0.00      365.25      730.50     1461.00      730.50 
> 
> # test it with a dataset where start time is always zero - works
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
 0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
     730.50     1095.75     1095.75      730.50        0.00 
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
 0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64 
     730.50     1095.75     1095.75      730.50        0.00 

The first example fails when providing the start/end times but works when providing the elapsed time, while the second example works under both start/end or elapsed time (because the start time is artificially set to zero).

I realize that is a work-around for this scenario, but shouldn't pyears + tcut behave the same regardless of how the intervals are encoded? Am I misunderstanding what tcut is supposed to do?

thanks, Peter

Peter Young
  • 131
  • 3
  • I don't think this is anything to do with `tcut`. The problem is with the Surv model. When you pass in a start time and end time, `Surv` assumes you are using interval censoring rather than right censoring. What puzzles me is why you need the actual start and end times in the model. You can have the entire `Surv` object as a column in your data frame, and pass any covariates you like to a model. It would be much easier to do this than try to coerce `Surv` to do something unusual, _and_ try to extract covariates from it. – Allan Cameron Mar 11 '20 at 10:36
  • actually, in addition to the issue pointed out by @AllanCameron, I noticed another conceptual flaw. My follow-up time has gaps due to temporary out-migration so I code the intervals with start and end times as the same individual can be followed over multiple intervals on the timeline. However, there is no way for tcut to know the enrolment date in this example, so really what I need to run tcut on is the 'age at beginning of interval'. I've converted my code to use that approach. – Peter Young Mar 12 '20 at 13:16

1 Answers1

0

My goal of correctly tabulating the age requires specifiying the age at the beginning of the interval rather than the age at (some prior enrolment) date, as shown here:

# another example, using DOB which is truly constant
set.seed(1234)
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
             etime = stime + 3652.50,
             outcome = c(1,1,1,0,0,0,0,0,0,0),
             dob = round(runif(10, as.Date("1930-01-01"), 
                               as.Date("1985-01-01"))),
             age.enr = floor((stime - dob)/365.25),
             age.end = floor((etime - dob)/365.25),
             sobj = Surv(etime - stime, outcome)) # just for convenience
summary(s1)
s1 %>% mutate_at(vars(stime, etime, dob), ~as.Date(.x, origin="1970-01-01"))

s1$enrd <- s1$stime - 365.25*3               # simulate an erolment date 3 years prior to this interval
s1$age.int <- s1$age.enr                     # actually, this is the age at beginning of interval, not enrolment
s1$age.enr <- floor((s1$enrd - s1$dob)/365.25)

pyears(sobj ~ tcut(age.enr, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # incorrect
pyears(sobj ~ tcut(age.int, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # correct

cutting the 'age.int' seems to give the desired behavior. I've also (I think) incorporated the recommendation from @AllanCameron to just store the object in the data.frame.

Peter Young
  • 131
  • 3