I have data I need to clean, but not sure how. I need to remove all records that occured less than 7 days after the last observation, excluding those that need to be removed.
Data example:
library(dplyr)
library(lubridate)
df = data.frame(id = c(rep(1,5), rep(2,3)),
date = c(ymd("2022-01-01"), ymd("2022-01-03"), ymd("2022-01-05"), ymd("2022-01-09"), ymd("2022-01-20"),
ymd("2022-01-02"), ymd("2022-01-03"), ymd("2022-01-09"))) %>%
arrange(id, date)
id date
1 1 2022-01-01
2 1 2022-01-03
3 1 2022-01-05
4 1 2022-01-09
5 1 2022-01-20
6 2 2022-01-02
7 2 2022-01-03
8 2 2022-01-09
And I want it to look like this
id date
1 1 2022-01-01
2 1 2022-01-09
3 1 2022-01-20
4 2 2022-01-02
5 2 2022-01-09
I tried using filter()
and lag()
, but they alone do not quite do it:
df %>%
group_by(id) %>%
mutate(prev = lag(date + days(7))) %>%
ungroup() %>%
filter(is.na(prev) | (date - prev >= 0))
id date
1 1 2022-01-01
2 1 2022-01-20
3 2 2022-01-02