3

I have a simple dataframe like the following:

ID, Type, a, b, c, d, e, f, etc.
ob1, 1,   1, 2, 3, 4, 5, 6, etc.
ob1, 2,   3, 4, 5, 6, 7, 1, etc.

I need to add the values of every 3 columns together, to produce new columns with the summed values. This would produce the following output:

ID, Type, sum1, sum2,  etc.
ob1, 1,     6,   15,   etc.
ob1, 2,    12,   14,   etc.

Using sequencing, I can do this manually for individual columns, but because I have many columns, how can I perform this summation automatically for every 3 columns (after a set starting point)?

M--
  • 25,431
  • 8
  • 61
  • 93
flâneur
  • 633
  • 2
  • 8

4 Answers4

4

You can split your dataframe into pieces. Here, I am using n_start to set the starting column and exclude those columns before that from splitting (df1[,-(1:(n_start-1))]).

Then I loop through (map) the sequence of the columns to slice the dataframe for every 3 columns (crating a list of dataframes).

Using map_dfc I get the rowSums for each of these new sliced dataframes.

I set the names to sum1, sum2, etc. and lastly bind the excluded columns before the starting point to get the final result.

library(tidyverse)

n_split <- 3
n_start <- 3

seq(n_split, ncol(df1), n_split) %>%   
  map(~ select(df1[,-(1:(n_start-1))],(.-(n_split-1)):.)) %>%  
  map_dfc(rowSums) %>% 
  set_names(., nm = paste0("sum", seq(ncol(.)))) %>% 
  bind_cols(df1[,1:(n_start-1)], .)
#>   # A tibble: 5 x 4
#>    ID type sum1 sum2 sum3
#> 1 ob1    1    6   15   24
#> 2 ob2    2   36   45   54
#> 3 ob3    3   66   75   84
#> 4 ob4    4   96  105  114
#> 5 ob5    5  126  135  144

Data:

df1 <- data.frame(ID = c("ob1", "ob2", "ob3", "ob4", "ob5"),
                  type = c(1, 2, 3, 4, 5),
                  a = c(1, 11, 21, 31, 41), 
                  b = c(2, 12, 22, 32, 42), 
                  c = c(3, 13, 23, 33, 43), 
                  d = c(4, 14, 24, 34, 44), 
                  e = c(5, 15, 25, 35, 45), 
                  f = c(6, 16, 26, 36, 46), 
                  g = c(7, 17, 27, 37, 47), 
                  h = c(8, 18, 28, 38, 48), 
                  i = c(9, 19, 29, 39, 49))
M--
  • 25,431
  • 8
  • 61
  • 93
2

We could use split.default in base R

cbind(df1[1:2], lapply(split.default(df1[-(1:2)],
   paste0('sum', as.integer(gl(ncol(df1)-2, 3, ncol(df1)-2)))), rowSums))
   ID type sum1 sum2 sum3
1 ob1    1    6   15   24
2 ob2    2   36   45   54
3 ob3    3   66   75   84
4 ob4    4   96  105  114
5 ob5    5  126  135  144

data

df1 <- structure(list(ID = c("ob1", "ob2", "ob3", "ob4", "ob5"), type = c(1, 
2, 3, 4, 5), a = c(1, 11, 21, 31, 41), b = c(2, 12, 22, 32, 42
), c = c(3, 13, 23, 33, 43), d = c(4, 14, 24, 34, 44), e = c(5, 
15, 25, 35, 45), f = c(6, 16, 26, 36, 46), g = c(7, 17, 27, 37, 
47), h = c(8, 18, 28, 38, 48), i = c(9, 19, 29, 39, 49)), 
class = "data.frame", row.names = c(NA, 
-5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can do it just like that. Where df is your original data.frame you would use the mutate command:

library(dplyr)
    
df%>%
    mutate(sum1=(a+b+c),
           sum2=(d+e+f))%>%
    select(ID, Type, sum1, sum2, etc)

Edit 2

Alternatively you could define a pattern with c_across, rowwise() + mutate doing the sum of 3 in 3, indicating the column number (variable) instead of the name of each column (variable).

library(tidyverse)

df<-data.frame(ID=c('ob1','ob1'),
               Type=c(1,2),
               a=c(1,3),
               b=c(2,4),
               c=c(3,5),
               d=c(4,6),
               e=c(5,7),
               f=c(6,1),
               etc=c('etc','etc'))%>%
  rowwise()%>%
  mutate(sum1=sum(c_across(3:5)),
         sum2=sum(c_across(6:8)))%>%
  select(ID,Type,sum1,sum2,etc)

df
#> # A tibble: 2 x 5
#> # Rowwise: 
#>   ID     Type  sum1  sum2 etc  
#>   <chr> <dbl> <dbl> <dbl> <chr>
#> 1 ob1       1     6    15 etc  
#> 2 ob1       2    12    14 etc

Created on 2022-10-28 by the reprex package (v2.0.1)

wesleysc352
  • 579
  • 1
  • 8
  • 21
  • Thank you, this works. However, I am trying to automate this, because I have so many columns and cannot type it out manually. Is there a way to index every three columns to be summed, rather than doing it manually like this? – flâneur Oct 28 '22 at 20:57
  • I edited the answer using a `rowwise()` and `c-across` approach – wesleysc352 Oct 29 '22 at 02:25
1

In base R you can do something like this:

num_cols <- df[-c(1:2)]

cbind(df[1:2], do.call(cbind, 
                       lapply(setNames(seq(1,length(num_cols), 3), 
                                       paste0("sum", seq(length(num_cols)/3))), \(a) {
  apply(num_cols[a:(a + 2)], 1, \(b) sum(as.numeric(gsub(",", "", b))))
  
})))

Because there are commas, I used gsub to remove them, setNames is used to give each column a dynamic name, apply is used within lapply to summarise each row

   ID. Type. sum1 sum2
1 ob1,    1,    6   15
2 ob1,    2,   12   14
Just James
  • 1,222
  • 2
  • 7