I have this type of data in an ordered R dataframe.
set.seed(25)
date <- sort(as.Date(sample( as.numeric(as.Date("2019-01-01")): as.numeric(as.Date("2021-03-31")), 10,
replace = T),
origin = '1970-01-01'))
type <- c("Football", "Football", "Rugby", "Football", "Hockey", "Tennis", "Hockey", "Basketball", "Basketball", "Rugby")
id <- c("1","1","1","1","2","2","3","4","4","5")
df <- data.frame(date,id, type)
date id type
2019-04-09 1 Football
2019-04-13 1 Football
2019-04-20 1 Rugby
2019-04-21 1 Football
2019-05-31 2 Hockey
2020-02-09 2 Tennis
2020-03-08 3 Hockey
2020-03-24 4 Basketball
2020-08-18 4 Football
2020-11-01 5 Rugby
The result I'm trying to get at is this:
date id type type_2
2019-04-09 1 Football Football
2019-04-13 1 Football Football
2019-04-20 1 Rugby Multi
2019-04-21 1 Football Multi
2019-05-31 2 Hockey Hockey
2020-02-09 2 Tennis Multi
2020-03-08 3 Hockey Hockey
2020-03-24 4 Basketball Basketball
2020-08-18 4 Basketball Basketball
2020-11-01 5 Rugby Rugby
Basically, the first sport in time an id practices stays if the next sport he practices is the same as the previous one, type_2 remains the same, but as soon as he changes sport later on, he changes to multi for the rest of his values later on.
I tried do this with lag()
, lead()
and if_else()
in dplyr
but the results never come out the way I want.