1

I have a data frame that looks like this:

Col1 Col2 Col3 Col4 
10    A    5    4    
10    A    6    3   
30    B    2    7   
45    C    5    1  
45    C    2    1   

I want to sum columns 3 and 4 only grouped by Col2, such that my resulting data frame looks like

Col1 Col2 Col3 Col4 
10    A    11    7    
30    B    2     7   
45    C    7     2  

Col1 is an identifier for Col2 so I want that to remain. Thanks

  • 1
    Related/possible duplicate: https://stackoverflow.com/questions/8212699/group-by-multiple-columns-and-sum-other-multiple-columns – Sotos Feb 19 '20 at 10:17
  • Does this answer your question? [R - dataframe - sum on group by columns](https://stackoverflow.com/questions/48353547/r-dataframe-sum-on-group-by-columns) – Anil Kumar Feb 19 '20 at 10:19
  • 1
    Does this answer your question? [Group by multiple columns and sum other multiple columns](https://stackoverflow.com/questions/8212699/group-by-multiple-columns-and-sum-other-multiple-columns) – B--rian Feb 19 '20 at 12:02

3 Answers3

1

1.Minimal reproducible example data:

df <- structure(list(Col1 = c(10L, 10L, 30L, 45L, 45L),
                     Col2 = c("A", "A", "B", "C", "C"), 
                     Col3 = c(5L, 6L, 2L, 5L, 2L),
                     Col4 = c(4L, 3L, 7L, 1L, 1L)),
                row.names = c(NA, -5L), class = "data.frame")

2.Solution using dplyr

library(dplyr)

df %>%
group_by(Col1, Col2) %>%
summarise(Col3 = sum(Col3),
          Col4 = sum(Col4))

Returns:

   Col1 Col2   Col3  Col4
  <int> <chr> <int> <int>
1    10 A        11     7
2    30 B         2     7
3    45 C         7     2
dario
  • 6,415
  • 2
  • 12
  • 26
1

You can use aggregate. As Col1 is an identifier for Col2 it could be used with Col2 for grouping.

aggregate(.~Col1+Col2, df, sum)
#  Col1 Col2 Col3 Col4
#1   10    A   11    7
#2   30    B    2    7
#3   45    C    7    2

or you can use rowsum, match and cbind:

x <- rowsum(df[c("Col3","Col4")], df$Col2)
cbind(df[match(rownames(x), df$Col2), c("Col1","Col2")], x)
#  Col1 Col2 Col3 Col4
#1   10    A   11    7
#3   30    B    2    7
#4   45    C    7    2
GKi
  • 37,245
  • 2
  • 26
  • 48
0

If you just want to group by Col2 and retain Col1, maybe you can use aggregate + merge from base R like below

dfout <- merge(unique(df[1:2]),aggregate(.~Col2,df[-1],sum))

such that

> dfout
  Col2 Col1 Col3 Col4
1    A   10   11    7
2    B   30    2    7
3    C   45    7    2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81