1

I have to create lead and lag variables like below in R

Suppose i have a dataframe which has details about a customer's visit to any store...

CustomerID  Dateofvisit
1   1/2/2013
1   1/3/2013
1   1/7/2013
2   1/9/2013
2   1/14/2013
2   2/14/2013
3   1/4/2013
3   1/5/2013

As we can see, there are 3 customers with different visit dates.. When i apply a lag function on the above...(i created my own function,)..it becomes like below:

CustomerID  Dateofvisit Laggeddate
1   1/2/2013    -
1   1/3/2013         1/2/2013
1   1/7/2013         1/3/2013
2   1/9/2013         1/7/2013
2   1/14/2013        1/9/2013
2   2/14/2013        1/14/2013
3   1/4/2013         2/14/2013
3   1/5/2013         1/4/2013

But, i want to lag by customer as well. So for the 4th row, the lagged date should be nothing..similarly for the 3rd cstomer, first row/entry should be notihng and on last row, i should see 1/4/2013.. How do i do this?

The following is code i use for lag/lead

shift<-function(x,shift_by){ 
    stopifnot(is.numeric(shift_by)) 
    stopifnot(is.numeric(x)) 

    if (length(shift_by)>1) 
        return(sapply(shift_by,shift, x=x)) 

    out<-NULL
    abs_shift_by=abs(shift_by) 
    if (shift_by > 0 ) 
        out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by)) 
    else if (shift_by < 0 ) 
        out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by)) 
    else 
        out<-x 
    out 
}

and how i lead/lag them:

#generate lead by 1 variable 
test$df_lead2<-shift(test$x,1) 
#generate lag by 1 variable 
test$df_lag2<-shift(test$x,-1) 

My desired output is:

CustomerID  Dateofvisit Laggeddate
1   1/2/2013    -
1   1/3/2013         1/2/2013
1   1/7/2013         1/3/2013
2   1/9/2013         -
2   1/14/2013        1/9/2013
2   2/14/2013        1/14/2013
3   1/4/2013         -
3   1/5/2013         1/4/2013
  • I *think* I understand your description in the text of the expected output for lag, but you should add an example of the desired output (similar to the failed attempt that you included) both for lag and lead. – Henrik Aug 28 '13 at 12:37

1 Answers1

2

Is this what you want?

library(plyr)
ddply(.data = df, .variables = .(CustomerID), mutate,
   lagdate = c(NA, head(Dateofvisit, -1)),
   leaddate = c(tail(Dateofvisit, -1), NA))
Henrik
  • 65,555
  • 14
  • 143
  • 159