I'm new to coding in R and I am having some trouble. I am trying to find the values in a column above a certain percentile (X%ile) per group, and then combining all rows below that percentile into a row others for every group.
My situation is very similar to the question here: How to use fct_lump() to get the top n levels by group and put the rest in 'other'?
Where I am grouping by two columns and trying to add rows in the second and third columns which add the name "Others" in the second column and sums all values below a percentile in the third column in the same row.
I am working with a large dataframe (df) where I have the following columns: Year (class = Integer, ie 2007, 2008, ...), SciName (class = character), and Flowers (class = numeric)
I am able to filter and only show rows with a value above a certain percentile using:
df_filter <- df %>%
filter(Flowers > quantile(Flowers, 0.7))
view(df_filter)
However, I have not been able to find a way to add the others row I need
Following the accepted answer from the similar question I linked above, I have tried:
df_Others <- df %>%
ungroup() %>%
group_by(Year) %>%
arrange(desc(Flowers)) %>%
mutate(a = row_number(-Flowers)) %>%
mutate(SciName = case_when(a < (quantile(df$Flowers, 0.7)) ~ "Others", TRUE ~ as.character(SciName))) %>%
mutate(a = case_when(a < (quantile(df$Flowers, 0.7)) ~ "Others", TRUE ~ as.character(a))) %>%
group_by(Year, SciName, a) %>%
summarize(Flowers = sum(Flowers)) %>%
arrange(Year, a) %>%
select(-a)
View(df_Others)
...but this does not work
Any suggestions on how to do this would be greatly appreciated!
EDIT:
Input:
Year SciName Flowers
2004 Liliac 2000
2004 Rose 3000
2004 Daisy 10
2004 Lily 5
2005 Liliac 20
2005 Rose 3
2005 Daisy 1000
2005 Lily 5000
... ... ...
Expected Output:
Year SciName Flowers
2004 Liliac 2000
2004 Rose 3000
2004 Others 15
2005 Daisy 1000
2005 Lily 5000
2005 Others 23
... ... ...