I am trying to specify sessions in a click stream data. I group rows based on months and userId and try to create another variable session, that looks at diff_days column, and increase by on if thats > 0.00209 and stays as the previous value otherwise. So basically I am trying to create session variable and use the lag version on it at the same time. The fist row in a group is always session = 1.
So take for example this data is one of the groups from group_by:
ID Month diff_days
2 0 NA
2 0 0.0002
2 0 0.001
2 0 0.01
2 0 0.00034
2 0 0.1
2 0 0.3
2 0 0.00005
and I want to create session variable within each group like this:
ID Month diff_days session
2 0 NA 1
2 0 0.0002 1
2 0 0.001 1
2 0 0.01 2
2 0 0.00034 2
2 0 0.1 3
2 0 0.3 4
2 0 0.00005 4
The code that I am using and not giving the right answer:
data <- data %>% group_by(ID, Month)
%>% mutate(session = ifelse(row_number() == 1, 1 ,
ifelse(diff_days < 0.0209, lag(session) , lag(session) + 1))) %>% ungroup()
I have been struggling with this for quite some time so any help would be greatly appreciated.
Thanks!