1

I am looking for an efficient solution to add subtotals for every column in a new row for each category in the column 'id'. I am able to achieve the desired output by using the code below, but this approach is not efficient for large datasets. Is it possible to accomplish this using datatables?

Thanks!

data <- data.frame(id = c("a","b","a","b","c","c","c","a","a","b"),
               total = c(1,2,3,4,2,3,4,2,3,4),
               total2 = c(2,3,4,2,3,4,5,6,4,2),
               total3 = c(2,3,4,5,6,3,2,3,4,5))

data_new <- data.frame(id = character(), total = numeric(), total2 = 
numeric(), total3 = numeric())

for (i in unique(data$id)){

subset <- data[data$id == i,]

subtotals <- data.frame(id = i, total = sum(subset$total), total2 = 
sum(subset$total2), total3 = sum(subset$total3))

subset <- rbind(subset,subtotals)

data_new <- rbind(data_new, subset)


}

data_new 
Robert K
  • 71
  • 3

3 Answers3

3

And for good measure here's a data.table solution:

library(data.table)
setDT(data)
rbind(data, data[, lapply(.SD,sum), by=id])[order(id)]
#    id total total2 total3
# 1:  a     1      2      2
# 2:  a     3      4      4
# 3:  a     2      6      3
# 4:  a     3      4      4
# 5:  a     9     16     13
# 6:  b     2      3      3
# 7:  b     4      2      5
# 8:  b     4      2      5
# 9:  b    10      7     13
#10:  c     2      3      6
#11:  c     3      4      3
#12:  c     4      5      2
#13:  c     9     12     11

Group by= the id variable, then calculate the sum for each variable other than the id - via lapply(.SD,sum). Then rbind this back to the main set, then order the rows by id.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
1

Here is a tidyverse style approach:

library(tidyverse)

data_new <- bind_rows(data, 
                      data %>% 
                        group_by(id) %>% 
                        summarise_all(sum)
                      ) %>% 
  arrange(id) %>% 
  print()
#>    id total total2 total3
#> 1   a     1      2      2
#> 2   a     3      4      4
#> 3   a     2      6      3
#> 4   a     3      4      4
#> 5   a     9     16     13
#> 6   b     2      3      3
#> 7   b     4      2      5
#> 8   b     4      2      5
#> 9   b    10      7     13
#> 10  c     2      3      6
#> 11  c     3      4      3
#> 12  c     4      5      2
#> 13  c     9     12     11

Created on 2019-01-10 by the reprex package (v0.2.1)

Bryan Shalloway
  • 748
  • 7
  • 15
1

Here is a base R solution using aggregate. Thanks to @thelatemail for streamlining the original version.

SubTotals = aggregate(data[,2:4], data["id"], sum)
data_new = rbind(data, SubTotals)
data_new = data_new[order(data_new$id),]
data_new
   id total total2 total3
1   a     1      2      2
3   a     3      4      4
8   a     2      6      3
9   a     3      4      4
11  a     9     16     13
2   b     2      3      3
4   b     4      2      5
10  b     4      2      5
12  b    10      7     13
5   c     2      3      6
6   c     3      4      3
7   c     4      5      2
13  c     9     12     11
G5W
  • 36,531
  • 10
  • 47
  • 80
  • `list(data$id)` could be `data["id"]` instead too, which will avoid the need to do the renaming. Or even `aggregate(. ~ id, data, sum)` – thelatemail Jan 10 '19 at 22:53
  • @thelatemail That is a great improvement. I will add it to the solution. – G5W Jan 10 '19 at 23:11