I got a dataframe with one numerical value and one 5 level factor variable.
# set seed for reproducibility
set.seed(123)
df <- tibble(group = rep(c("a", "b", "c", "d", "e"), each = 20),
values = c(rnorm(20, 0, 1), rnorm(20, 1, 1), rnorm(20, 2, 1),
rnorm(20, 3, 1), rnorm(20, 4, 1)))
I want to use summarize to get the quantiles like
df %>%
group_by(group) %>%
summarize(quantiles = quantile(values, c(0.25, 0.75)))
df %>%
group_by(group) %>%
summarize(quantile0.25 = quantile(values, c(0.25)),
quantile0.75 = quantile(values, c(0.75)))
Either one of these. I don't know which would be more practical, getting the quantiles per one row with two variables or two rows as one variable.
And finally i want (preferably in the same pipe) use the quantiles to filter for outliers in the original dataframe, not the summarize dataframe, in each respective group, like
df %>%
group_by() %>%
summarize() %>%
filter()
where each group is filtered by their respective quantiles+-1,5IQR.
Is this possible, what would be the best approach? I think it would be straightforward to filter by group with one filter value that gets applied to all groups, but how do I apply a different filter value for each group?