82

I'm trying to mutate a new variable from sort of row calculation, say rowSums as below

iris %>% 
  mutate_(sumVar = 
            iris %>% 
            select(Sepal.Length:Petal.Width) %>%
            rowSums)

the result is that "sumVar" is truncated to its first value(10.2):

Source: local data frame [150 x 6]
Groups: <by row>

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
1           5.1         3.5          1.4         0.2  setosa   10.2
2           4.9         3.0          1.4         0.2  setosa   10.2
3           4.7         3.2          1.3         0.2  setosa   10.2
4           4.6         3.1          1.5         0.2  setosa   10.2
5           5.0         3.6          1.4         0.2  setosa   10.2
6           5.4         3.9          1.7         0.4  setosa   10.2
..
Warning message:
Truncating vector to length 1 

Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.

Edit:

More specifically, is there any way to realize the inline custom function with dplyr?

I'm wondering if it is possible do something like:

iris %>% 
  mutate(sumVar = colsum_function(Sepal.Length:Petal.Width))
leoluyi
  • 942
  • 1
  • 7
  • 14
  • 5
    Really strange that `iris %>% select(Sepal.Length:Petal.Width) %>% rowSums()` works fine but `iris %>% mutate(sumVar = iris %>% select(Sepal.Length:Petal.Width) %>% rowSums())` throws a "Error: Bad indices 1" + warning message. – talat Dec 08 '14 at 09:38
  • I am trying to work on it with different approaches, but this error appears very frequently using `.` (I am also doing something silly sometimes). – Davide Passaretti Dec 08 '14 at 09:49
  • 3
    For operations like `sum` that already have an efficient vectorised row-wise alternative, the proper way is currently: `df %>% mutate(total = rowSums(across(where(is.numeric))))` `across` can take anything that `select` can (e.g. `rowSums(across(Sepal.Length:Petal.Width))` also works). See the full spiel about [row-wise](https://dplyr.tidyverse.org/articles/rowwise.html) and [across](https://dplyr.tidyverse.org/reference/across.html) – Fons MA Apr 15 '21 at 08:46

7 Answers7

142

This is more of a workaround but could be used

iris %>% mutate(sumVar = rowSums(.[1:4]))

As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example

iris %>% 
  mutate(sumVar = rowSums(select(., contains("Sepal")))) %>% 
  head 

or

iris %>% 
  mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>% 
  head
talat
  • 68,970
  • 21
  • 126
  • 157
  • 1
    Which version of `dplyr` are you using? When I try you example with `dplyr_0.4.1`, I receive an exception: `Error in is.data.frame(x) : object '.' not found`. – Jubbles May 07 '15 at 19:19
  • 3
    If it's of use to anyone, the reason why I was receiving the error `Error in is.data.frame(x) : object '.' not found` was because I had an old version of `magrittr`. When I updated from `magrittr_1.0.1` to `magrittr_1.5`, everything worked fine. – Jubbles May 07 '15 at 19:37
  • This very useful answer, I was wondering whether it would be possible to expand by using dplyr's **`match`** or **`contains`** instead of the `.[1:4]`? – Konrad Feb 28 '16 at 19:22
  • 12
    @Konrad, you could do something like `iris %>% mutate(sumVar = rowSums(select(., contains("Sepal")))) %>% head` or `iris %>% mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>% head` – talat Feb 28 '16 at 21:23
  • @docendodiscimus Thanks very much for the useful comment. – Konrad Feb 29 '16 at 07:53
  • 2
    The comment by @docendodiscimus really should be another (vote-able) answer. It is the most robust dplyr-esque solution. – D. Woods Apr 28 '16 at 04:32
  • 1
    It's nice that this works, although Hadley says that a solution like this "works by coincidence, not by design. I wouldn't rely on it." But maybe it is supported now? Does anyone know? https://github.com/tidyverse/dplyr/issues/2050 – Melkor.cz Mar 12 '18 at 19:22
23

You can use rowwise() function:

iris %>% 
  rowwise() %>% 
  mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))

#> # A tibble: 150 x 6
#> # Rowwise: 
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa    10.2
#>  2          4.9         3            1.4         0.2 setosa     9.5
#>  3          4.7         3.2          1.3         0.2 setosa     9.4
#>  4          4.6         3.1          1.5         0.2 setosa     9.4
#>  5          5           3.6          1.4         0.2 setosa    10.2
#>  6          5.4         3.9          1.7         0.4 setosa    11.4
#>  7          4.6         3.4          1.4         0.3 setosa     9.7
#>  8          5           3.4          1.5         0.2 setosa    10.1
#>  9          4.4         2.9          1.4         0.2 setosa     8.9
#> 10          4.9         3.1          1.5         0.1 setosa     9.6
#> # ... with 140 more rows

"c_across() uses tidy selection syntax so you can to succinctly select many variables"'

Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.

HBat
  • 4,873
  • 4
  • 39
  • 56
  • 16
    For operations like `sum` that already have an efficient vectorised row-wise alternative, the proper way is currently: `df %>% mutate(total = rowSums(across(where(is.numeric))))` `across` can take anything that `select` can (e.g. `rowSums(across(Sepal.Length:Petal.Width))` also works). Scroll down the [row-wise](https://dplyr.tidyverse.org/articles/rowwise.html) vignette to find this and have a look at [across](https://dplyr.tidyverse.org/reference/across.html) – Fons MA Apr 15 '21 at 08:52
  • @FonsMA this should rather be an answer, and even an accepted answer, than a comment – Dan Chaltiel Jan 10 '23 at 09:15
  • @DanChaltiel, thanks, I didn't see the comment had this many upvotes! it's now an answer below – Fons MA Jan 13 '23 at 00:45
12

A more complicated way would be:

 iris %>% select(Sepal.Length:Petal.Width) %>%
mutate(sumVar = rowSums(.)) %>% left_join(iris)
Davide Passaretti
  • 2,741
  • 1
  • 21
  • 32
  • Thanks Davide. `left_join` sounds a nice solution if using it with `by` key; however, it's not so robust and intuitive for this circumstance – leoluyi Dec 10 '14 at 06:29
  • I also worry the automatic "by" parameter selection in the join could cause some troubles.. the columns could contain non-unique values on some rows .. – Melkor.cz Mar 12 '18 at 19:09
9

Adding @docendodiscimus's comment as an answer. +1 to him!

iris %>% mutate(sumVar = rowSums(select(., contains("Sepal"))))
psychonomics
  • 714
  • 4
  • 12
  • 26
3

I am using this simple solution, which is a more robust modification of the answer by Davide Passaretti:

iris %>% select(Sepal.Length:Petal.Width) %>%
  transmute(sumVar = rowSums(.)) %>% bind_cols(iris, .)

(But it requires a defined row order, which should be fine, unless you work with remote datasets perhaps..)

Melkor.cz
  • 1,977
  • 17
  • 15
2

As requested, transforming my commment into an answer:

For operations like sum that already have an efficient vectorised row-wise alternative, the proper way is currently:

df %>% mutate(total = rowSums(across(where(is.numeric))))

across can take anything that select can (e.g. rowSums(across(Sepal.Length:Petal.Width)) also works).

Scroll down the row-wise vignette to find this and have a look at across

Fons MA
  • 1,142
  • 1
  • 12
  • 21
1

You can also use a grep in place of contains or matches, just in case you need to get fancy with the regular expressions (matches doesn't seem to much like negative lookaheads and the like in my experience).

iris %>% mutate(sumVar = rowSums(select(., grep("Sepal", names(.)))))
llewmills
  • 2,959
  • 3
  • 31
  • 58