2

Dear all: I've been trying to create a time-to-event variable. Indeed, some time ago, i asked here for help. However, I've detected that it does not fully fulfill my purpose.

Below is my data and the variable I want to create "Time-to-event".

df2 = structure(list(Country = c("USA", "USA", "USA", "USA", "USA", 
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA"), year = 2000:2012, 
    Event = c(0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L), `**Time-to-event**` = c(0L, 1L, 2L, 0L, 1L, 2L, 3L, 
    0L, 1L, 0L, 1L, 2L, 3L)), .Names = c("Country", "year", "Event", 
"**Time-to-event**"), row.names = c(NA, -13L), class = "data.frame")

Country  year              Event      **Time-to-event**
USA      2000               0            0
USA      2001               0            1
USA      2002               1            2
USA      2003               0            0
USA      2004               0            1
USA      2005               0            2
USA      2006               1            3
USA      2007               0            0
USA      2008               1            1
USA      2009               0            0
USA      2010               0            1
USA      2011               0            2
USA      2012               0            3

I was suggested to use the following code in order to create the time-to-event varaible

i1 <- with(df2, ave(Event, Country, FUN= 
         function(x) cumsum(c(TRUE, diff(x)<0))))
df2$Time_to_event <- with(df2, ave(i1, i1, Country, FUN= seq_along)-1)

It worked well but the problem with this code is that it counts cases where the Event=1 many years in a row. See below for an example:

Country  year              Event      **Time-to-event**
USA      2000               0            0
USA      2001               0            1
USA      2002               1            2
USA      2003               0            0
USA      2004               1            **1**
USA      2005               1            **2**
USA      2006               1            **3**
USA      2007               0            0
USA      2008               1            1

Instead, I would like it to give a value of zero (0) for cases where the Event variable is 1 in the following years, rather than counting 1's To be clear, this is how I want to see the "time-to-event" variable.

Country  year              Event      **Time-to-event**
USA      2000               0            0
USA      2001               0            1
USA      2002               1            2
USA      2003               0            0
USA      2004               0            1
USA      2005               1            2
USA      2006               1            0
USA      2007               1            0
USA      2008               1            0
USA      2009               0            0
USA      2010               0            1
Community
  • 1
  • 1
FKG
  • 285
  • 1
  • 4
  • 17

1 Answers1

3

You can use data.table as follows:

require(data.table)
setDT(dat)[,tte := seq.int(0,.N-1L), by = cumsum(Event)-Event]

So you end up with:

 > dat
    Country year Event **Time-to-event** tte
 1:     USA 2000     0                 0   0
 2:     USA 2001     0                 1   1
 3:     USA 2002     1                 2   2
 4:     USA 2003     0                 0   0
 5:     USA 2004     0                 1   1
 6:     USA 2005     1                 2   2
 7:     USA 2006     1                 0   0
 8:     USA 2007     1                 0   0
 9:     USA 2008     1                 0   0
10:     USA 2009     0                 0   0
11:     USA 2010     0                 1   1

Why?

lets have a look at the components:

 > dat[,.(Event, cumsum = cumsum(Event), run = cumsum(Event)-Event)]
    Event cumsum run
 1:     0      0   0
 2:     0      0   0
 3:     1      1   0
 4:     0      1   1
 5:     0      1   1
 6:     1      2   1
 7:     1      3   2
 8:     1      4   3
 9:     1      5   4
10:     0      5   5
11:     0      5   5

Event + cumsum add up building the number of the run. Grouping by this sequence makes it work.

Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • Hi @Floo0 and thanks for sharing this with me. I tried this it gives this error: **unused argument (by = rleid(cumsum(Event) - Event))**. what means "tte"? (time to event, got it!) – FKG Mar 22 '16 at 14:41
  • 2
    `cumsum(shift(Event, fill=1L))` is another thing you could put in `by=`. – Frank Mar 22 '16 at 14:43
  • 2
    @FKG you need to `setDT(df2)` first for that error to go away. `tte` just abbreviates "time to event", I guess. – Frank Mar 22 '16 at 14:44
  • Thanks for this @Floo0. It didn't work initially and just counted observations in my data. Then I took out all NAs and it worked well. Is there any way to account for NAs? – FKG Mar 22 '16 at 15:04
  • dear @Frank and Floo0, do you know how to do the same calculation for each state? I have about 170 "country" in my data. It should start from 0 for each states. How would you specify this in the suggested code above? – FKG Mar 23 '16 at 13:48
  • 1
    You can specify multiple columns in `by`. So you could do `by = .(state, cumsum(Event)-Event)` – Rentrop Mar 23 '16 at 13:57