2

I want to sum across undefined amount of columns that start with the same string pattern. I would like for each column of my new data frame to have the column name of the string that was used in column name search. However, I am not sure, how to automatically assign column names that it would result in a given format, e.g.

c(m = "m", w = "w")

I want to use lapply in combination with rowSums like here:

lapply(c(m = "m", w = "w"),
                 \(x) rowSums(df[startsWith(names(df), x)]))

Basic input:

#        m_16 w_16 w_17 m_17 w_18 m_18 
#values1    3    4    8    1   12    4
#values2    8    0   12    1    3    2  

Desired output:

#        m_16 w_16 w_17 m_17 w_18 m_18     m     w
#values1    3    4    8    1   12    4     8    24
#values2    8    0   12    1    3    2    11    15

However, as I have mentioned above, there could be more columns and they could start with z, w, etc, and sums also should be calculated, so I want to vectorize the "column name giving" and not to assign the column names by hand.

I have tried looking for it through other stackoverflow threads, but wasn't sure how to search for this problem and have no idea how to solve it myself, beside assigning column names afterwards.

aerospace
  • 35
  • 4

2 Answers2

2

Supposing your first column is named # and the other columns are named in a pattern like letter_SomethingElse.

search_pattern <- unique(gsub("(?<=^[a-z]).*", "\\1", names(df), perl = TRUE))[-1]
names(search_pattern) <- search_pattern

cbind(df, lapply(search_pattern, \(x) rowSums(df[startsWith(names(df), x)])))

returns

         # m_16 w_16 w_17 m_17 w_18 m_18  m  w
1 #values1    3    4    8    1   12    4  8 24
2 #values2    8    0   12    1    3    2 11 15
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • 1
    Thank you very much for the answer! I did not know that names is a valid command for objects that are not data frames and somehow thought it would be a difficult assignment. Thank you! – aerospace Nov 17 '22 at 14:12
1

A tidyverse solution:

# reproducible data
df <-read_csv("m_16, w_16, w_17, m_17, w_18, m_18 
3,    4,    8,    1,   12,    4
8,    0,   12,    1,    3,    2")

# get unique first character of col names
map_dfc(unique(substring(names(df), 1,1)),
    function(x){
      # for each col grouping do a rowwise sum
      df %>% select(starts_with(x)) %>% 
        rowwise() %>%
        mutate(!!x := sum(cur_data()))
    })

# A tibble: 2 × 8
# Rowwise: 
   m_16  m_17  m_18     m  w_16  w_17  w_18     w
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     3     1     4     8     4     8    12    24
2     8     1     2    11     0    12     3    15
pluke
  • 3,832
  • 5
  • 45
  • 68
  • 1
    By using `rowSums` you could avoid the `rowwise` part: `map_dfc(unique(substring(names(df), 1,1)), ~ df %>% select(starts_with(.x)) %>% mutate(!!.x := rowSums(cur_data())))`. Nice solution though. – Martin Gal Nov 17 '22 at 15:38
  • 1
    ah, thanks, can't say I often do rowwise work! – pluke Nov 17 '22 at 15:45