I am trying to perform a comparison between items in subsequent groups in a dataframe - I guess this is pretty easy when you know what you are doing...
My data set can be represented as follows:
set.seed(1)
data <- data.frame(
date = c(rep('2015-02-01',15), rep('2015-02-02',16), rep('2015-02-03',15)),
id = as.character(c(1005 + sample.int(10,15,replace=TRUE), 1005 + sample.int(10,16,replace=TRUE), 1005 + sample.int(10,15,replace=TRUE)))
)
Which yields a dataframe that looks like:
date id
1/02/2015 1008
1/02/2015 1009
1/02/2015 1011
1/02/2015 1015
1/02/2015 1008
1/02/2015 1014
1/02/2015 1015
1/02/2015 1012
1/02/2015 1012
1/02/2015 1006
1/02/2015 1008
1/02/2015 1007
1/02/2015 1012
1/02/2015 1009
1/02/2015 1013
2/02/2015 1010
2/02/2015 1013
2/02/2015 1015
2/02/2015 1009
2/02/2015 1013
2/02/2015 1015
2/02/2015 1008
2/02/2015 1012
2/02/2015 1007
2/02/2015 1008
2/02/2015 1009
2/02/2015 1006
2/02/2015 1009
2/02/2015 1014
2/02/2015 1009
2/02/2015 1010
3/02/2015 1011
3/02/2015 1010
3/02/2015 1007
3/02/2015 1014
3/02/2015 1012
3/02/2015 1013
3/02/2015 1007
3/02/2015 1013
3/02/2015 1010
Then I want to group the data by date (group_by) and then filter out duplicates (distinct) before comparing between the groups. What I want to do is determine from day to day which new id's are added and which id's leave. So day 1 and day 2 would be compared to determine the id's in day 2 that were not in day 1 and the id's that were in day 1 but not present in day 2, then do the same comparisons between day 2 and day 3 etc.
The comparison can be done very easily using an anti_join (dplyr) but I don't know how to reference individual groups in the dataset.
My attempt (or one of my attempts) looks like:
data %>%
group_by(date) %>%
distinct(id) %>%
do(lost = anti_join(., lag(.), by="id"))
But of course this does not work, I just get:
Error in anti_join_impl(x, y, by$x, by$y) : Can't join on 'id' x 'id' because of incompatible types (factor / logical)
Is what I am attempting to do even possible or should I be looking at writing a clunky function to do it?