0

I have a data frame that looks like this:

 id     date
 1001   2012-10-11
 1005   2013-02-20
 1005   2012-11-21
 1005   2014-03-14
 1003   2013-10-25
 1003   2013-11-30

I need to find, for each row, the number of days that have passed since the last occurrence of that id. For the above example, the answer would look like this:

 id     date        no_of_days
 1001   2012-10-11  NA
 1005   2013-02-20  91
 1005   2012-11-21  NA
 1005   2014-03-14  387
 1003   2013-10-25  NA
 1003   2013-11-30  36

A bit of searching got me to the point where I can add a new column with values that were generated by applying a function on subgroups (the R equivalent of Ststa's bysort):

df$no_of_days<-with(df,ave(id,id,FUN=days_passed,na.rm=TRUE))

However, defining the new function days_passed is proving to be tricky as I have to find the last occurrence of that uniqid, and then formulate the function accordingly.

I'm new to R, so any help on this would be greatly appreciated.

shrgm
  • 1,315
  • 1
  • 10
  • 20
  • This is a good question on R programming but doesn't seem to have a statistical aspect - is it a candidate for migration to SO? – Silverfish Nov 23 '14 at 18:01

1 Answers1

5

I exclusively use data.table, so this is an answer using data.table...

library(data.table)
dt <- data.table(id=c(1001,1005,1005,1005,1003,1003),
date=as.IDate(c("2012-10-11","2013-02-20","2012-11-21",
"2014-03-14", "2013-10-25","2013-11-30")))

setkeyv(dt, c("id","date"))
dt[,delta:=c(NA,diff(date)),by=id]
dt
     id       date delta
1: 1001 2012-10-11    NA
2: 1003 2013-10-25    NA
3: 1003 2013-11-30    36
4: 1005 2012-11-21    NA
5: 1005 2013-02-20    91
6: 1005 2014-03-14   387

Function setkeyv both indexes and sorts dt on id and date. Then we compute the delta by stepping over the indexed values in id and computing the first difference of date. Because first differences of $n$ values will yield $n-1$ entires, we concatenate NA to the result.

Conveniently, the use of data.tableis very fast, even for large objects.

Sycorax
  • 1,298
  • 13
  • 26