Here are two alternative solutions which use forcats
, stringr
, and regular expressions to directly manipulate factor levels.
If I understand correctly, the issue was caused by food
being a factor which is not handled appropriately by replace()
.
1. fct_collapse()
The fct_collapse()
function is used to collapse all factor levels which start with "fruit "
(note the trailing blank) into factor level "fruit":
library(dplyr)
library(stringr)
library(forcats)
df %>%
group_by(food = fct_collapse(food, fruit = levels(food) %>% str_subset("^fruit "))) %>%
summarise(sold = sum(sold))
food sold
<fct> <dbl>
1 bread 99.4
2 egg fruits 100.
3 fruit 300.
4 fruity wine 100.
5 meat 101.
Note that an enhanced sample data set is used which includes edge cases to better test the regular expression. Furthermore, the grouping variable is computed directly in group_by()
which saves to call mutate()
beforehand.
2. str_replace()
with look-behind
There is an even shorter solution which uses str_replace()
instead of replace()
together with a more sophisticated regular expression. The regular exprresion uses a look-behind in order to delete all characters after the leading "fruit"
(including the blank which follows "fruit"):
df %>%
group_by(food = str_replace(food, "(?<=^fruit)( .*)", "")) %>%
summarise(sold = sum(sold))
The result is the same as above.
Enhanced data sample set
set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread",
"meat", "egg fruits", "fruity wine"),
sold = rnorm(7, 100))
df
food sold
1 fruit banana 99.45412
2 fruit apple 100.53659
3 fruit grape 100.41962
4 bread 99.41637
5 meat 100.84746
6 egg fruits 100.26602
7 fruity wine 100.44459