4

I would like to concatenate an arbitrary number of columns in a dataframe based on a variable cols_to_concat

df <- dplyr::data_frame(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat = c("a", "b", "c")

To achieve the desired result with this specific value of cols_to_concat I could do this:

df %>% 
  dplyr::mutate(concat = paste0(a, b, c))

But I need to generalise this, using syntax a bit like this

# (DOES NOT WORK)
df %>% 
  dplyr::mutate(concat = paste0(cols))

I'd like to use the new NSE approach of dplyr 0.7.0, if this is appropriate, but can't figure out the correct syntax.

RobinL
  • 11,009
  • 8
  • 48
  • 68

3 Answers3

9

You can perform this operation using only the tidyverse if you'd like to stick to those packages and principles. You can do it by using either mutate() or unite_(), which comes from the tidyr package.

Using mutate()

library(dplyr)
df <- tibble(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat <- c("a", "b", "c")

df %>% mutate(new_col = do.call(paste0, .[cols_to_concat]))

# A tibble: 3 × 4
      a     b     c new_col
  <chr> <chr> <chr>   <chr>
1     a     d     g     adg
2     b     e     h     beh
3     c     f     i     cfi

Using unite_()

library(tidyr)
df %>% unite_(col='new_col', cols_to_concat, sep="", remove=FALSE)

# A tibble: 3 × 4
  new_col     a     b     c
*   <chr> <chr> <chr> <chr>
1     adg     a     d     g
2     beh     b     e     h
3     cfi     c     f     i

EDITED July 2020

As of dplyr 1.0.0, it appears that across() and c_across() are replacing the underscore verbs (e.g. unite_) and scoped variants like mutate_if(), mutate_at() and mutate_all(). Below is an example using that convention. Not the most concise, but still an option that promises to be more extensible.

Using c_across()

library(dplyr)

df <- tibble(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat <- c("a", "b", "c")

df %>% 
  rowwise() %>% 
  mutate(new_col = paste0(c_across(all_of(cols_to_concat)), collapse=""))
#> # A tibble: 3 x 4
#> # Rowwise: 
#>   a     b     c     new_col
#>   <chr> <chr> <chr> <chr>  
#> 1 a     d     g     adg    
#> 2 b     e     h     beh    
#> 3 c     f     i     cfi

Created on 2020-07-08 by the reprex package (v0.3.0)

Steven M. Mortimer
  • 1,618
  • 14
  • 36
5

You can try syms from rlang:

library(dplyr)
packageVersion('dplyr')
#[1] ‘0.7.0’
df <- dplyr::data_frame(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat = c("a", "b", "c")

library(rlang)
cols_quo <- syms(cols_to_concat)
df %>% mutate(concat = paste0(!!!cols_quo))

# or
df %>% mutate(concat = paste0(!!!syms(cols_to_concat)))

# # A tibble: 3 x 4
#       a     b     c concat
#   <chr> <chr> <chr>  <chr>
# 1     a     d     g    adg
# 2     b     e     h    beh
# 3     c     f     i    cfi
mt1022
  • 16,834
  • 5
  • 48
  • 71
-1

You can do the following:

library(dplyr)

df <- dplyr::data_frame(a = letters[1:3], b = letters[4:6], c = letters[7:9])

cols_to_concat = lapply(list("a", "b", "c"), as.name)

q <- quos(paste0(!!! cols_to_concat))

df %>% 
  dplyr::mutate(concat = !!! q)
RobinL
  • 11,009
  • 8
  • 48
  • 68
  • Having experimented further, the above code works - although I must admit, I'm not sure I fully understand it. Please add another answer if there's a better way of doing this and I'll accept it... – RobinL Jun 18 '17 at 09:08
  • 3
    Yes, there are many easy ways achieving this such as `df$concat <- do.call(paste0, df[cols_to_concat])` but unfortunately everything these days has to be achieved with the freaking dplyr, so good luck with your quest there. – David Arenburg Jun 18 '17 at 10:28
  • Thanks - I did experiment with `do.call` but couldn't figure it out. I hadn't realised you could give it a df, but thinking about it, it's obvious because a df is just a list. This is clearly a better solution than the tidyverse answer. – RobinL Jun 18 '17 at 10:32
  • The `tidyverse` provides a convenience pasting function called `unite()`, but it's also possible to use `do.call` and `paste0` inside `mutate`. I've provided both of these solutions as an answer. – Steven M. Mortimer Jun 19 '17 at 15:23