Suppose we start with this data frame:
mydat <-
data.frame(
ID = c(115,115,115,88,88,88,100,100),
Period = c(1, 2, 3, 1, 2, 3, 1, 2),
Status_1 = c(1,2,1,1,2,3,2,1),
Status_2 = c("Open","Open","Terminus","Open","Open","Closed","Open","Open")
)
> mydat
ID Period Status_1 Status_2
1 115 1 1 Open
2 115 2 2 Open
3 115 3 1 Terminus
4 88 1 1 Open
5 88 2 2 Open
6 88 3 3 Closed
7 100 1 2 Open
8 100 2 1 Open
Next, we run the following dplyr grouping to sum the number of instances by Period and Status_1, where Status_2 = "Open":
mydat %>%
group_by(Period,Status_1) %>%
summarize(StatusCount = n_distinct(ID[Status_2 == "Open"]))
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 0
6 3 3 0
I've been trying to expand the above n_distinct()
function to also include Status_2 = "Terminus" (in addition to the "Open" in the above code). I've tried various iterations of "or" conditions, and summing tricks, with no luck yet. Any ideas how to do this?
The result, including the Status_2 = "Terminus", would look like this:
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 1
6 3 3 0