1

I want to calculate the person-time of follow-up by calendar month. In my example, I have three subjects, with different times of follow-up. I want to know if the rates of the event vary by year tertiles, so I want to sum up the time at risk they spend in each of the tertiles.

library(lubridate)
library(survival)

event <- c(1,1,1)
id <- c(1,2,2)
followup_time <- c(365, 365*2, 365*3)
right.date <- c(ymd("2012-06-01"), ymd("2013-09-01"), ymd("2011-01-01"))
left.date <- right.date-followup
tertile <- cut(month(right.date), c(0,4,9,12), include.lowest = T)


df <- data.frame(id, left.date, right.date, followup_time, event, tertile); df
 id  left.date right.date followup_time event tertile
1  1 2011-06-01 2012-06-01           365     1   (4,9]
2  2 2011-09-01 2013-09-01           730     1   (4,9]
3  2 2008-01-01 2011-01-01          1095     1   [0,4]

sum(df$followup_time)
[1] 2190


Using the package survival in R, function pyears() I get the following results. However, although the number of subjects and events are correct, the person-time of follow-up is incorrect, according to my needs.


s <- Surv(time =  followup_time, event = event)

summary(pyears(s ~ tertile , scale = 1))

Call: pyears(formula = s ~ tertile , scale = 1)

number of observations = 3

 month    N   Events   Time  
-------- --- -------- ------ 
 [0,4]    1     1      1095 
 (4,9]    2     2      1095 
 (9,12]   0     0         0 

I expect the following results, which correspond to the sum of the time at risk each subject spent in each of the intervals.

month    N   Events   Time  
-------- --- -------- ------ 
[0,4]    1     1      547.5
(4,9]    2     2      547.5 
(9,12]   0     0      547.5

Some people use the function tcut() from this same package to do this kind of operation for calculating person-time, but I did not have satisfactory results.

1 Answers1

0

I don't understand the confusion (or maybe it's really simple and nothing to do with survival package functions):

df
#--------
  id  left.date right.date followup_time event tertile
1  1 2011-06-02 2012-06-01           365     1   (4,9]
2  2 2011-09-02 2013-09-01           730     1   (4,9]
3  2 2008-01-02 2011-01-01          1095     1   [0,4]

month(right.date)
#[1] 6 9 1

It has to do with how the default R cut function works. Intervals are closed on the right. I happen to find that most people expect the intervals to be closed on the left, and if you want that you would execute:

> df <- data.frame(id, left.date, right.date, followup_time, event, tertile); df
  id  left.date right.date followup_time event tertile
1  1 2011-06-02 2012-06-01           365     1   [4,9)
2  2 2011-09-02 2013-09-01           730     1  [9,12]
3  2 2008-01-02 2011-01-01          1095     1   [0,4)
> s <- with(df, Surv(time =  followup_time, event = event))
> 
> summary(pyears(s ~ tertile , scale = 1))
Call: pyears(formula = s ~ tertile, scale = 1)

number of observations = 3

 tertile   N   Events   Time  
--------- --- -------- ------ 
  [0,4)    1     1      1095 
  [4,9)    1     1       365 
 [9,12]    1     1       730 
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Your answer is helpful, but it is not what I look for. I need the exact time every person passed through each interval. In my example, each interval should have equal times in the final output – Jose Victor Zambrana Mar 13 '20 at 00:53
  • In this example, they teach how to use tcut with age and calendar time, but I cannot find a solution for this monthly periods: https://www.mayo.edu/research/documents/biostat-81pdf/doc-10026981 – Jose Victor Zambrana Mar 13 '20 at 01:16
  • You have not explained why you think each interval should have equal "Time". Seems to me that inspection of the dataframe should show you that is impossible. There's no way to redistribute "Time" over three intervals when the original data only has values in the first two interval when defined by the use of the default `cut` function.. – IRTFM Mar 13 '20 at 20:42
  • (You should also realize that sometimes Therneau uses 365.24 as the number of days in a year. I don't think that is what is going on here, however.) – IRTFM Mar 13 '20 at 20:48
  • https://www.ctspedia.org/wiki/pub/CTSpedia/FollowUpTime/0226_1km.JPG In epidemiology, we use person-time at risk using this kind of frame in the image. If each line represents each tercile of multiple years, each row the persons in observation, then the sum of specific terciles of all persons together should be approximately equally distributed. At least that according to my logic. Maybe I am wrong – Jose Victor Zambrana Mar 17 '20 at 07:03
  • "each line" means what exactly? Presumably you are either referring to the horizontal lines that represent intervals of observation or to the vertical lines representing calendar time. In what way do either of these sets of lines "represent each tercile of multiple years"? Are you using the term "tercile" in the same manner as the rest of the world? How can each line represent "each tercile" (of anything)? – IRTFM Mar 17 '20 at 07:27
  • It was a hypothetical example. Let be each vertical line each an interval division in the calendar time. Each calendar year would be divided into three equal periods (terciles). If I am studying three years, I will have 6 vertical lines. If I sum up the time and events from the first tercile for all the years, I will obtain the average rate of events during the first tercile taking on account all years. – Jose Victor Zambrana Mar 17 '20 at 19:45
  • That is a completely different meaning of the term tercile from how it is defined by statisticians. The rates will be the number of events divided by the person years at risk. The period incidence will be the number of events divided by the duration of the interval. – IRTFM Mar 18 '20 at 06:28
  • I guess I was wrong about that. *Tertile* would be the right term – Jose Victor Zambrana Mar 19 '20 at 00:20
  • My understanding is that tertile and tercile are synonymous. – IRTFM Mar 21 '20 at 18:00