I'm trying to find the top 3 factor levels within each group, based on an aggregating variable, and group the remaining factor levels into "other" for each group. Normally I'd use fct_lump_n for this, but I can't figure out how to make it work within each group. Here's an example, where I want to form groups based on the x variable, order the y variables based on the value of z, choose the first 3 y variables, and group the rest of y into "other":
set.seed(50)
df <- tibble(x = factor(sample(letters[18:20], 100, replace = T)),
y = factor(sample(letters[1:10], 100, replace = T)),
z = sample(100, 100, replace = T))
I've tried doing this:
df %>%
group_by(x) %>%
arrange(desc(z), .by_group = T) %>%
slice_head(n = 3)
which returns this:
# A tibble: 9 x 3
# Groups: x [3]
x y z
<fct> <fct> <int>
1 r i 95
2 r c 92
3 r a 88
4 s g 94
5 s g 92
6 s f 92
7 t j 100
8 t d 93
9 t i 81
This is basically what I want, but I'm missing the 'other' variable within each of r, s, and t, which collects the values of z which have not been counted.
Can I use fct_lump_n for this? Or slice_head combined with grouping the excluded variables into "other"?