Merge 2 variables vertically R tidyverse

Question

I conducted a survey in 2 languages and I want to combine the questions in the 2 languages into one variable.

The answers of the form are all in the same data.frame. The date is my primary key. Unfortunately I am still new to R and could not find how to elegantly combine these.

Example As Is situation

Date	Place_English	Plane_English	Place_French	Plane_French
One			azea	Three
Two	ertert	ertt

To Be

Date	Place	Plane
One	azea	Three
Two	ertert	ertt

Thanks for the suggestion! :) I think this will not work in my case because there are some missing values in the set of questions in 1 language. — Josse_, Aug 05 '21 at 13:57
That's pretty much what `coalesce` does really well - it ignores the first NAs until it finds a non-NA value and returns that one. See my example below. The tricky bit is that there are some warnings with factors. — coffeinjunky, Aug 05 '21 at 14:09

score 4 · Answer 1 · answered Aug 05 '21 at 14:02

Just to follow up on my comment, assuming the empty values are NAs:

library(tidyverse)

Creating the data:

df <- data.frame(place_english = c(NA, "ertert"), 
                 plane_english = c(NA, "ertt"), 
                 place_french = c("azea", NA), 
                 plane_french=c("Three", NA),
                 stringsAsFactors = F)

Using coalesce to replace NAs with the first non-NA values:

df %>% mutate(Plane = coalesce(plane_english, plane_french),
              Place = coalesce(place_english, place_french),
             )
Source: local data frame [2 x 6]
Groups: <by row>

# A tibble: 2 x 6
  place_english plane_english place_french plane_french Plane Place 
  <chr>         <chr>         <chr>        <chr>        <chr> <chr> 
1 NA            NA            azea         Three        Three azea  
2 ertert        ertt          NA           NA           ertt  ertert

You could also achieve the same for one column at a time using e.g.

df$Place <- coalesce(df$place_english, df$place_french)

score 3 · Answer 2 · answered Aug 05 '21 at 13:59

This should do the trick

df %>%
  as_tibble() %>% 
  mutate_if(is.character, list(~na_if(.,""))) %>% #only needed if the missing fields are stored as blanks and not already NA
  transmute(
    Date,
    Place = coalesce(Place_English, Place_French),
    Plane = coalesce(Plane_English, Plane_French)
  )

score 3 · Answer 3 · answered Aug 05 '21 at 14:03

Two approaches, both using dplyr

Case-1: If NA/missing values are there

df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
One NA NA   azea    Three
Two ertert  ertt    NA NA   ")

library(dplyr)

df %>%
  mutate(across(ends_with('_English'), ~ coalesce(., get(gsub('_English', '_French', cur_column()))),
                   .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
#>   Date  Place Plane
#> 1  One   azea Three
#> 2  Two ertert  ertt

case-2: if there are empty strings instead

df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
One '' ''   azea    Three
Two ertert  ertt    ''  ''  ")
library(tidyverse)

df %>%
  mutate(across(ends_with('_English'), ~ paste0(., get(gsub('_English', '_French', cur_column()))),
                   .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
#>   Date  Place Plane
#> 1  One   azea Three
#> 2  Two ertert  ertt

score 3 · Answer 4 · answered Aug 05 '21 at 14:32

In case there are >2 columns and you don't want to type them all, you can use same approach as @coffeinjunky, but with across

df <- data.frame(place_english = c(NA, "ertert"), 
                 plane_english = c(NA, "ertt"), 
                 place_french = c("azea", NA), 
                 plane_french=c("Three", NA),
                 stringsAsFactors = F)

library(dplyr, warn.conflicts = FALSE)

df %>% 
  transmute(place = do.call(coalesce, across(starts_with('place'))), 
            plane = do.call(coalesce, across(starts_with('plane'))))
#>    place plane
#> 1   azea Three
#> 2 ertert  ertt

^{Created on 2021-08-05 by the reprex package (v2.0.1)}

dy_by · Accepted Answer · 2021-08-06T08:11:17.037

use paste if you dont want lose any data

library(dplyr)
df%>% mutate(Place = paste(Place_English, Place_French),
             Plane = paste(Plane_English, Plane_French),
             across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns

or coalesce if you want to get rid of NAs

df%>% mutate(Place = coalesce(Place_English, Place_French),
             Plane = coalesce(Plane_English, Plane_French),
             across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns

if you want combine more than 2 cols, use unite from tidyr. set na.rm according to your preferences

library(tidyr)
df %>% 
  unite("Place", colnames(df)[grepl(pattern = "Place", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) %>%  ## all cols including "Place" in name
  unite("Plane", colnames(df)[grepl(pattern = "Plane", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name

library(tidyr)
cols_to_paste <- colnames(df[,]) ## to choose only sepecified cols i.e. df[,15:25] or df[,c(15,18,20,25)]

df %>% 
  unite('Place', cols_to_paste[grepl(pattern = 'Place', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) %>% ## all cols including "Place" in name
  unite('Plane', cols_to_paste[grepl(pattern = 'Plane', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name

If I don't want to lose data, is there a way to do this without having to name all the columns myself? — Josse_, Aug 05 '21 at 14:42
Sorry to be unclear, in the case there are way more columns originating from my survey, and not only 2 as in this example. For example in a way that columns 25:35 are ppasted under columns 15:25 — Josse_, Aug 06 '21 at 06:40
editted. check if this is what you are looking for. otherwise, lets chat about this — dy_by, Aug 06 '21 at 08:02

score 2 · Answer 6 · answered Aug 05 '21 at 14:38

2

Here is a base R approach using split.default which should work dynamically for any number of groups.

tmp <- df[-1]

result <- cbind(df[1], sapply(split.default(tmp, sub('_.*', '', names(tmp))),
                function(x) do.call(pmax, c(x, na.rm = TRUE))))

result

#  Date  Place Plane
#1  One   azea Three
#2  Two ertert  ertt

answered Aug 05 '21 at 14:38

Ronak Shah

377,200
20
156
213

It's weird that `split.default` doesn't have any documentation! Such a fantastic function. – Anoushiravan R Aug 06 '21 at 08:33

Merge 2 variables vertically R tidyverse

6 Answers6