4

I conducted a survey in 2 languages and I want to combine the questions in the 2 languages into one variable.

The answers of the form are all in the same data.frame. The date is my primary key. Unfortunately I am still new to R and could not find how to elegantly combine these.

Example As Is situation

Date Place_English Plane_English Place_French Plane_French
One azea Three
Two ertert ertt

To Be

Date Place Plane
One azea Three
Two ertert ertt
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
Josse_
  • 111
  • 8

6 Answers6

4

Just to follow up on my comment, assuming the empty values are NAs:

library(tidyverse)

Creating the data:

df <- data.frame(place_english = c(NA, "ertert"), 
                 plane_english = c(NA, "ertt"), 
                 place_french = c("azea", NA), 
                 plane_french=c("Three", NA),
                 stringsAsFactors = F)

Using coalesce to replace NAs with the first non-NA values:

df %>% mutate(Plane = coalesce(plane_english, plane_french),
              Place = coalesce(place_english, place_french),
             )
Source: local data frame [2 x 6]
Groups: <by row>

# A tibble: 2 x 6
  place_english plane_english place_french plane_french Plane Place 
  <chr>         <chr>         <chr>        <chr>        <chr> <chr> 
1 NA            NA            azea         Three        Three azea  
2 ertert        ertt          NA           NA           ertt  ertert

You could also achieve the same for one column at a time using e.g.

df$Place <- coalesce(df$place_english, df$place_french)
coffeinjunky
  • 11,254
  • 39
  • 57
3

This should do the trick

df %>%
  as_tibble() %>% 
  mutate_if(is.character, list(~na_if(.,""))) %>% #only needed if the missing fields are stored as blanks and not already NA
  transmute(
    Date,
    Place = coalesce(Place_English, Place_French),
    Plane = coalesce(Plane_English, Plane_French)
  )
Quixotic22
  • 2,894
  • 1
  • 6
  • 14
3

Two approaches, both using dplyr

Case-1: If NA/missing values are there

df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
One NA NA   azea    Three
Two ertert  ertt    NA NA   ")

library(dplyr)

df %>%
  mutate(across(ends_with('_English'), ~ coalesce(., get(gsub('_English', '_French', cur_column()))),
                   .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
#>   Date  Place Plane
#> 1  One   azea Three
#> 2  Two ertert  ertt

case-2: if there are empty strings instead

df <- read.table(header = T, text = "Date   Place_English   Plane_English   Place_French    Plane_French
One '' ''   azea    Three
Two ertert  ertt    ''  ''  ")
library(tidyverse)

df %>%
  mutate(across(ends_with('_English'), ~ paste0(., get(gsub('_English', '_French', cur_column()))),
                   .names = "{gsub('_English', '', .col)}"), .keep = 'unused')
#>   Date  Place Plane
#> 1  One   azea Three
#> 2  Two ertert  ertt
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
3

In case there are >2 columns and you don't want to type them all, you can use same approach as @coffeinjunky, but with across

df <- data.frame(place_english = c(NA, "ertert"), 
                 plane_english = c(NA, "ertt"), 
                 place_french = c("azea", NA), 
                 plane_french=c("Three", NA),
                 stringsAsFactors = F)

library(dplyr, warn.conflicts = FALSE)

df %>% 
  transmute(place = do.call(coalesce, across(starts_with('place'))), 
            plane = do.call(coalesce, across(starts_with('plane'))))
#>    place plane
#> 1   azea Three
#> 2 ertert  ertt

Created on 2021-08-05 by the reprex package (v2.0.1)

IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
2

use paste if you dont want lose any data

library(dplyr)
df%>% mutate(Place = paste(Place_English, Place_French),
             Plane = paste(Plane_English, Plane_French),
             across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns 

or coalesce if you want to get rid of NAs

df%>% mutate(Place = coalesce(Place_English, Place_French),
             Plane = coalesce(Plane_English, Plane_French),
             across(Place_English:Plane_French, ~NULL)) ## last line to remove unnecessary columns 

if you want combine more than 2 cols, use unite from tidyr. set na.rm according to your preferences

library(tidyr)
df %>% 
  unite("Place", colnames(df)[grepl(pattern = "Place", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) %>%  ## all cols including "Place" in name
  unite("Plane", colnames(df)[grepl(pattern = "Plane", colnames(df))] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name
library(tidyr)
cols_to_paste <- colnames(df[,]) ## to choose only sepecified cols i.e. df[,15:25] or df[,c(15,18,20,25)]

df %>% 
  unite('Place', cols_to_paste[grepl(pattern = 'Place', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) %>% ## all cols including "Place" in name
  unite('Plane', cols_to_paste[grepl(pattern = 'Plane', cols_to_paste)] , remove = T, sep = " ", na.rm = TRUE) ## all cols including "Plane" in name
dy_by
  • 1,061
  • 1
  • 4
  • 13
  • If I don't want to lose data, is there a way to do this without having to name all the columns myself? – Josse_ Aug 05 '21 at 14:42
  • which columns do you mean? – dy_by Aug 05 '21 at 15:35
  • Sorry to be unclear, in the case there are way more columns originating from my survey, and not only 2 as in this example. For example in a way that columns 25:35 are ppasted under columns 15:25 – Josse_ Aug 06 '21 at 06:40
  • editted. check if this is what you are looking for. otherwise, lets chat about this – dy_by Aug 06 '21 at 08:02
2

Here is a base R approach using split.default which should work dynamically for any number of groups.

tmp <- df[-1]

result <- cbind(df[1], sapply(split.default(tmp, sub('_.*', '', names(tmp))),
                function(x) do.call(pmax, c(x, na.rm = TRUE))))

result

#  Date  Place Plane
#1  One   azea Three
#2  Two ertert  ertt
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213