I have a table containing two rows for each ID.
table <- tibble(
id = c(1,1,2,2,3,3,4,4,5,5),
row1 = c(2,5,2,5,1,3,2,5,3,2),
row2 = c("foo", "other foo", "bar", "bar", "bar", "bar other", "other", "foo", "other", "other")
)
> table
# A tibble: 10 × 3
id row1 row2
<dbl> <dbl> <chr>
1 1 2 foo
2 1 5 other foo
3 2 2 bar
4 2 5 bar
5 3 1 bar
6 3 3 bar other
7 4 2 other
8 4 4 foo
9 5 3 other
10 5 2 other
I would like to resolve the table to a single row for each ID based on three rules in succession:
- If, for each ID, there is one row in which row1 is 5 or more, then choose the row where row1 is less than 5.
- Else if, for each ID, there is a row in which row2 contains the word 'other', choose the row where row2 does not contain the word 'other'
- Otherwise, for each ID, pick the first row.
I feel there must be a more straightforward way of doing this. This is my attempt so far, but I've can't work out how to resolve the NA to return 'bar'.
table %>%
group_by(id) %>%
summarise(
row1 = ifelse(max(row1) >= 5,
first(row1[row1 < 5]),
ifelse(
grep("other", row2),
ifelse(
!is.na(first(row1[grep("other", row2, invert = T)])),
first(row1[grep("other", row2, invert = T)]),
first(row1)),
first(row1))
),
row2 = ifelse(
max(row1) >= 5,
first(row2[row1 < 5]),
ifelse(
grep("other", row2),
ifelse(
!is.na(first(row2[grep("other", row2, invert = T)])),
first(row2[grep("other", row2, invert = T)]),
first(row2)),
first(row2)
)
)
)
# A tibble: 5 × 3
id row1 row2
<dbl> <dbl> <chr>
1 1 2 foo
2 2 2 NA
3 3 1 bar
4 4 2 foo
5 5 3 other
Desired output:
id | row1 | row2 |
---|---|---|
1 | 2 | foo |
2 | 2 | bar |
3 | 1 | bar |
4 | 2 | other |
5 | 3 | other |
Many thanks for your help.