I have a dataset, where I have different provider
attending a site
multiple times.
I want to create columns to show start
and stop
dates where they are present.
Here's a sample dataset:
x <- tibble(
"site" = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
"provider" = c("A", "B", "C", "A", "A", "C", "C", "A", "C", "A"),
"date" = seq(from = as.Date("2019-01-01"), to = as.Date("2019-01-10"), by = 'day'),
)
I've been unable to create a bit of code that adequately captures the fact that a provider might be present several times. The best I got so far is:
x %>%
group_by(site, provider) %>%
mutate("start" = min(date),
"end" = max(date))
Which creates this:
# A tibble: 10 x 5
# Groups: site, provider [5]
site provider date start end
<dbl> <chr> <date> <date> <date>
1 1 A 2019-01-01 2019-01-01 2019-01-05
2 1 B 2019-01-02 2019-01-02 2019-01-02
3 1 C 2019-01-03 2019-01-03 2019-01-03
4 1 A 2019-01-04 2019-01-01 2019-01-05
5 1 A 2019-01-05 2019-01-01 2019-01-05
6 2 C 2019-01-06 2019-01-06 2019-01-09
7 2 C 2019-01-07 2019-01-06 2019-01-09
8 2 A 2019-01-08 2019-01-08 2019-01-10
9 2 C 2019-01-09 2019-01-06 2019-01-09
10 2 A 2019-01-10 2019-01-08 2019-01-10
However, this only process the individual provider
once per site
.
Here is what I would like the final dataset to look like:
# A tibble: 10 x 5
# Groups: site, provider [5]
site provider date start end
<dbl> <chr> <date> <date> <date>
1 1 A 2019-01-01 2019-01-01 2019-01-01 # A stops at 2019-01-01
2 1 B 2019-01-02 2019-01-02 2019-01-02
3 1 C 2019-01-03 2019-01-03 2019-01-03
4 1 A 2019-01-04 2019-01-04 2019-01-05 # A restarts from 2019-01-04
5 1 A 2019-01-05 2019-01-04 2019-01-05
6 2 C 2019-01-06 2019-01-06 2019-01-07
7 2 C 2019-01-07 2019-01-06 2019-01-07 # C stops at 2019-01-07
8 2 A 2019-01-08 2019-01-08 2019-01-08 # A stops at 2019-01-08
9 2 C 2019-01-09 2019-01-09 2019-01-09 # C restarts at 2019-01-09
10 2 A 2019-01-10 2019-01-10 2019-01-10 # A restarts at 2019-01-10