I'm trying to use pyears to estimate incidence in a cohort where one of my covariates of interest is age at time of event (rather than age at enrolment, i.e. enrolment cohort). Age at event is of course time-dependent. The correct way to do this appears to be using tcut on age at enrolment as illustrated in the help for pyears. However, it appears to only work when the start time is always zero (or you use the equivalent approach of providing a Surv object with follow-up time rather than start/end times). For my scenario, it is important to use the actual start/end times because I also want to use other time-varying covariates like calendar year.
Here is an example to illustrate the problem:
library(tidyverse)
library(survival)
# encode actual start/end dates
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
etime = stime + 365.25,
futime = etime - stime,
outcome = c(1,1,1,0,0,0,0,0,0,0),
age.enr = floor(runif(10, 15, 64.999)))
# encode time elapsed from origin of zero
s2 <- tibble(stime = 0,
etime = stime + 365.25,
futime = etime - stime,
outcome = c(1,1,1,0,0,0,0,0,0,0),
age.enr = floor(runif(10, 15, 64.999)))
# these ought to give the same results, but don't (the second one appears to be right)
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
# test it with a dataset where start time is always zero - works
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
This results in:
> # these ought to give the same results, but don't (the second one appears to be right)
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
0.00 0.00 0.00 0.00 365.25
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
0.00 365.25 730.50 1461.00 730.50
>
> # test it with a dataset where start time is always zero - works
> pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
730.50 1095.75 1095.75 730.50 0.00
> pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale = 365.25)
0+ thru 24 24+ thru 34 34+ thru 44 44+ thru 54 54+ thru 64
730.50 1095.75 1095.75 730.50 0.00
The first example fails when providing the start/end times but works when providing the elapsed time, while the second example works under both start/end or elapsed time (because the start time is artificially set to zero).
I realize that is a work-around for this scenario, but shouldn't pyears + tcut behave the same regardless of how the intervals are encoded? Am I misunderstanding what tcut is supposed to do?
thanks, Peter