0

My data.frage looks like this:

   VAR1 VAR2 AUS1 AUS2 AUS3 AUS4 ... AUS56 VAR3 VAR4
   A    D    23   234  34   856  ... 99    0    FCK
   B    D    55   76   55   36   ... 6456  0    XYC

I'd like R to add a new variable AUS which shows the rowsums of the variables AUS1 to AUS56, preferably with dplyr. AUS1 to AUS56 can then be deleted.

Laubsauger
  • 69
  • 2
  • 9
  • Consider using the tidy 'long' format for the data, where `AUS` is one column, and the number that comes after is another column. Then you'll aggregate easily with the most traditional `dplyr` verbs like `group_by` and `summarise`. – liborm Apr 20 '17 at 12:57

3 Answers3

3

You can try use rowSums in combination with grep:

df %>% mutate(AUS_sum = rowSums(.[grep("AUS", names(.))]))
mtoto
  • 23,919
  • 4
  • 58
  • 71
  • I wonder if this could be done using `dplyr::select_if?` instead of `grep`. – zx8754 Apr 20 '17 at 09:58
  • This does work. But is there any way to delete the old variables at the same time? Transmute deleted ALL remaining variables, but I just want to get rid of those that have been used to make the calculation. – Laubsauger Apr 20 '17 at 10:21
2

Here is another option using tidyverse syntax

library(tidyverse)
df1 %>% 
     select(matches("AUS")) %>% 
     reduce(`+`) %>%
     mutate(df1, AUS_sum = .)
#   VAR1 VAR2 AUS1 AUS2 AUS3 AUS4 AUS56 VAR3 VAR4 AUS_sum
#1    A    D   23  234   34  856    99    0  FCK    1246
#2    B    D   55   76   55   36  6456    0  XYC    6678

With the devel version of dplyr (soon to be released 0.6.0) we can create a function with quosures and make it more dynamic. Here, the enquo does similar functionality as substitute from base R by taking the input arguments and converting it to quosure, with quo_name, we convert it to string where matches takes string argument. The lhs name can also be created as string ('newN') and within the mutate/summarise/group_by, we unquote (!! or UQ) to evaluate the string

fSum <- function(dat, pat){
  pat <- quo_name(enquo(pat))
  newN <- paste0(pat, "_sum")
  newSum <- dat %>%
            select(matches(pat)) %>%
            reduce(`+`)
  dat %>%
      mutate(!!newN :=  newSum)
}

fSum(df1, AUS)
#    VAR1 VAR2 AUS1 AUS2 AUS3 AUS4 AUS56 VAR3 VAR4 AUS_sum
#1    A    D   23  234   34  856    99    0  FCK    1246
#2    B    D   55   76   55   36  6456    0  XYC    6678

Based on the OP's comment on the other post about removing the columns that used for sum, we can modify the function

fSumN <- function(dat, pat){
  pat <- quo_name(enquo(pat))
  newN <- paste0(pat, "_sum")
  newSum <- dat %>%
            select(matches(pat)) %>%
            reduce(`+`)
  dat %>%
       select(-matches(pat)) %>%
       mutate(!!newN :=  newSum)
}

fSumN(df1, AUS)
#     VAR1 VAR2 VAR3 VAR4 AUS_sum
#1    A    D    0  FCK    1246
#2    B    D    0  XYC    6678

data

df1 <- structure(list(VAR1 = c("A", "B"), VAR2 = c("D", "D"), AUS1 = c(23L, 
55L), AUS2 = c(234L, 76L), AUS3 = c(34L, 55L), AUS4 = c(856L, 
36L), AUS56 = c(99L, 6456L), VAR3 = c(0L, 0L), VAR4 = c("FCK", 
"XYC")), .Names = c("VAR1", "VAR2", "AUS1", "AUS2", "AUS3", "AUS4", 
 "AUS56", "VAR3", "VAR4"), class = "data.frame", row.names = c(NA, 
-2L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Nice. Also you could just do `mutate(sum1 = rowSums(.))` after `select` – Sotos Apr 20 '17 at 10:08
  • The first gives me an error message: `Error in function_list[[i]](value) : could not find function "reduce"`. The second one, honestly, looks rather complicated for such a simple operation. – Laubsauger Apr 20 '17 at 10:20
  • @Laubsauger If you loaded `tidyverse`, the `library(purrr)` should get loaded. The function is from that package. THe second one is using non-standard evaluation making it more simpler when compared to the earlier versions with `interp` and stuff like that. Function have the advantage of using multiple times with multiple patterns – akrun Apr 20 '17 at 10:21
  • @Laubsauger Also updated with a new function to `get rid of those that have been used to make the calculation` – akrun Apr 20 '17 at 10:27
  • Great. Thanks a lot. But now that I we are the function option already, can all this be done in a loop, too? Because I actually have numerous variable groups, not only AUT. So can I do the following in one command: `wiot <- wiot %>% mutate(AUS = rowSums(.[grep("AUS", names(.))])) %>% mutate(AUT = rowSums(.[grep("AUT", names(.))])) %>% mutate(BEL = rowSums(.[grep("BEL", names(.))])) %>% mutate(BGR = rowSums(.[grep("BGR", names(.))]))` – Laubsauger Apr 20 '17 at 10:34
  • @Laubsauger It could be possible, but please check your question. These new criteria was not mentioned in your post – akrun Apr 20 '17 at 10:39
  • 1
    if you have a new question, you can post one instead of updating the current one with new requirements. @Laubsauger – mtoto Apr 20 '17 at 10:44
  • @mtoto, you're absolutely right. I've posted a follow-up question: http://stackoverflow.com/questions/43518958/rowsums-conditional-on-clumn-name-in-a-loop – Laubsauger Apr 20 '17 at 12:01
0

In base R:

df$AUS <- rowSums(df[,grep('AUS', names(df))])
989
  • 12,579
  • 5
  • 31
  • 53