Summing up Rows based on similar column values in R

Question

Here is an example data frame

ID    Var1    Var2    Var3 ....... Var85
A      3       2        1            3
B      1       3        1            2
A      2       1        1            1
A      1       2        2            1
C      3       1        3            2
C      2       1        2            1
B      1       3        3            1

I want to create the following, basically summing up rows based on ID

ID    Var1    Var2    Var3 ....... Var85
A      6       5        4            5
B      2       2        4            3
C      5       6        5            3

I found a solution for only a single variable using the dplyr, but I know how to implement that with multiple columns

df <- df %>% group_by(ID) %>% summarise(Var1 = sum(Var2)) %>% as.data.frame()

I thought of implementing the following via a loop, but I am hoping for a much simpler solution.

Use `across` inside `summarise`: `df %>% group_by(ID) %>% summarise(across(Var1:Var85, ~ sum(.x)))` — PaulS, Jun 16 '22 at 20:18
This was a example dataframe, My actual dataframe doesn't have the colnames in such a good representation — Marble, Jun 16 '22 at 20:43
I can use your solution, by doing the following, export the colnames out in a vector, then rename the colnames as above, then apply the solution and then import back the colnames. — Marble, Jun 16 '22 at 20:44
Well, without seeing a piece of your dataset, it is hard to offer a better solution. — PaulS, Jun 16 '22 at 20:51
The dataset is not much different, instead of the column names as c(Var1:Var85), they look like c(alpha, beta, gamma, delta, .........., omega ) — Marble, Jun 16 '22 at 21:09
If these columns of yours are sequential, then use: `df %>% group_by(ID) %>% summarise(across(alpha:omega, ~ sum(.x)))`. — PaulS, Jun 16 '22 at 21:14
Error in `dplyr::summarise()`: ! Problem while computing `..1 = across(everything() ~ sum(.x))`. ℹ The error occurred in group 0: character(0). Caused by error in `across()`: ! Must supply a column selection. ℹ You most likely meant: `across(everything(), everything() ~ sum(.x))`. ℹ The first argument `.cols` selects a set of columns. ℹ The second argument `.fns` operates on each selected columns. Run `rlang::last_error()` to see where the error occurred. — Marble, Jun 16 '22 at 21:22
"ID", "X1.16" ,"X1.16.1","X1.16.2","X1.17" "X1.17.1" "X1.17.2" "X1.17.3" , "X1.18"........ and so on — Marble, Jun 16 '22 at 21:34
This works: `df <- data.frame( ID = c("A", "B", "A", "A", "C", "C", "B"), X1.16 = c(3L, 1L, 2L, 1L, 3L, 2L, 1L), X1.16.1 = c(2L, 3L, 1L, 2L, 1L, 1L, 3L), X1.16.2 = c(1L, 1L, 1L, 2L, 3L, 2L, 3L), X1.17 = c(3L, 2L, 1L, 1L, 2L, 1L, 1L) ) df %>% group_by(ID) %>% summarise(across(X1.16:X1.17, ~ sum(.x)))` — PaulS, Jun 16 '22 at 21:38
Alternatively, you can use `across(2:5` instead of `across(X1.16:X1.17`. — PaulS, Jun 16 '22 at 21:40
I see that the class of the columns of my data are factors, is it the reason it is problemetic, and your dataset is numeric — Marble, Jun 16 '22 at 21:43
So, do this: `df %>% group_by(ID) %>% summarise(across(X1.16:X1.17, ~ sum(.x %>% as.numeric)))`. `as.numeric` will convert your factors to numbers before summing. — PaulS, Jun 16 '22 at 21:47
you meant only df %>% group_by(ID) %>% summarise(across(X1.16:X1.17, ~ sum(.x %>% as.numeric))),......which works perfectly — Marble, Jun 16 '22 at 21:56
thanks a lot for helping out, should have cross-checked my classes in the dataframe @PaulS — Marble, Jun 16 '22 at 21:57

Summing up Rows based on similar column values in R

0 Answers0