I'm trying to come up with an elegant, rule-based way to assign codes to rows in a data frame based on combinations of values in columns, using this data:
library(tidyr)
df <- crossing(yr2018=c("M","S","W"),
yr2019=c("M","S","W"),
yr2020=c("M","S","W")) %>%
print(n=27)
# A tibble: 27 × 3
yr2018 yr2019 yr2020
<chr> <chr> <chr>
1 M M M
2 M M S
3 M M W
4 M S M
5 M S S
6 M S W
7 M W M
8 M W S
9 M W W
10 S M M
11 S M S
12 S M W
13 S S M
14 S S S
15 S S W
16 S W M
17 S W S
18 S W W
19 W M M
20 W M S
21 W M W
22 W S M
23 W S S
24 W S W
25 W W M
26 W W S
27 W W W
>
What I want to end up with is a column with codes applied with rules such the following:
- if all 3 values in yr2018, yr2019, and yr2020 are the same (MMM, SSS, or WWW), then set the new column value to the concatenation of "CON" and whatever the unique value is, so either "CONM", "CONS", or "CONW".
- if the first and third columns are the same, but the second is different, then concatenate the two unique values together as exactly "MS","MW", or "SW", in that order, depending on which two unique values are in the row, regardless of the order of the values in the columns.
- if all three are different, regardless of order, then "MSW"
- if the last two are the same, but different from the first, then concatenate "CON" with the last value, so either "CONM", "CONS", or "CONW"
- lastly, if the first two are the same and the last different, then concatenate "CON" with the first column, so either "CONM", "CONS", or "CONW"
This feels like a big, ugly if
statement, but I'm hoping for something more elegant, especially since my real data is actually 4x5 (625 rows). It also feels like maybe regular expressions, which I struggle with.
I started looking into row-wise functions and found rowwise()
as a start to logically reconfigure the data frame, but it looks like the number of functions that can operate that way are limited.
All guidance welcome!