I have a dataframe that needs to be summarized by column B into one dataframe. I also need to summarize this dataframe by column A into another dataframe. For context's sake, column B is a subcolumn of column A in hierarchy. I also only need columns C:E, so I decided that dplyr would be the most helpful.
A | B | C | D | E | F | G
-------------------------------------
1 1A 3 4 5 3 2
1 1B 4 4 4 4 3
2 2A 2 2 2 2 2
...
My team decided that a function would be the most efficient way to write this in order to achieve cleaner code. If I wanted to summarize the dataframe by column A, I know I would write the script to be something such as this:
df %>%
select(A, C, D, E) %>%
group_by(A) %>%
summarise(C = sum(C), D = sum(D), E = sum(E)
and B such as this:
df %>%
select(B, C, D, E) %>%
group_by(B) %>%
summarise(C = sum(C), D = sum(D), E = sum(E)
I am struggling to translate this into a function that works for either scenario. Here is what I have so far:
slicedata <- function(df, column_name){
df %>%
select(column_name, C, D, E) %>%
group_by(column_name) %>%
summarise(C = sum(C), D = sum(D), E = sum(E)
}
But when I pass column B as an argument in that function, this is what I get:
slicedata(df, B)
Error in .f(.x[[i]], ...) : object 'B' not found
Basically: I am trying to write a function for this dataframe that allows me to aggregate the integer columns by whichever column I pass as an argument. I do not understand why this error is showing up, however.