Merge survey columns across variables in R

Question

I am analyzing a very large survey in which I want to combine four parts of the survey, through several combinations of 4 questions. Below I have created a small example. A little background: a respondent either answered q2, q5, q8 or q9, because they only filled in 1 of 4 parts of the survey based on their answer in q1 (not shown here).Therefore, only one of the four columns contains an answer (1 or 2), while the others contain NAs. q2, q5, q8, q9 are similar questions that have the same answer options, which is why I want to combine them to make my dataset less wide and make it easier to further analyze the data.

q2_1 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_1 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_1 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_1 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)
q2_2 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_2 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_2 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_2 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)

df <- data.frame(q2_1, q5_1, q8_1, q9_1, q2_2, q5_2, q8_2, q9_2)
df

# running df shows: 
  q2_1 q5_1 q8_1 q9_1 q2_2 q5_2 q8_2 q9_2
1   NA   NA   NA    1   NA   NA   NA    1
2   NA   NA   NA    2   NA   NA   NA    2
3   NA   NA    1   NA   NA   NA    1   NA
4   NA   NA    2   NA   NA   NA    2   NA
5   NA    1   NA   NA   NA    1   NA   NA
6   NA    2   NA   NA   NA    2   NA   NA
7    1   NA   NA   NA    1   NA   NA   NA
8    2   NA   NA   NA    2   NA   NA   NA

My desired end result would be a dataframe with only columns for questions starting with q2_ (so, in the example that would be q2_1 and q2_2; in reality there's about 20 for this question), but with the NAs replaced for the answer options from the corresponding q5_, q8_, and q_9.

# desired end result
  q2_1 q2_2 
1    1   1
2    1   2
3    1   1   
4    2   2   
5    1   1   
6    2   2   
7    1   1   
8    2   2

For single questions, i've done this using the code below, but this is very manual and because q2, q5, q8, and q9 both go up to _20, I'm looking for a way to automate this more.

# example single question
library(tidyverse)
df <- df %>%
  mutate(q2_1 = case_when(!is.na(q2_1) ~ q2_1, 
                          !is.na(q5_1) ~ q5_1,
                          !is.na(q8_1) ~ q8_1,
                          !is.na(q9_1) ~ q9_1))

I hope I explained myself well enough and looking forward for some directions!

Yuriy Saraykin · Answer 1 · 2022-04-04T09:59:24.053

q2_1 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_1 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_1 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_1 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)
q2_2 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_2 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_2 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_2 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)

df <- data.frame(q2_1, q5_1, q8_1, q9_1, q2_2, q5_2, q8_2, q9_2)
df
#>   q2_1 q5_1 q8_1 q9_1 q2_2 q5_2 q8_2 q9_2
#> 1   NA   NA   NA    1   NA   NA   NA    1
#> 2   NA   NA   NA    2   NA   NA   NA    2
#> 3   NA   NA    1   NA   NA   NA    1   NA
#> 4   NA   NA    2   NA   NA   NA    2   NA
#> 5   NA    1   NA   NA   NA    1   NA   NA
#> 6   NA    2   NA   NA   NA    2   NA   NA
#> 7    1   NA   NA   NA    1   NA   NA   NA
#> 8    2   NA   NA   NA    2   NA   NA   NA

library(tidyverse)

suffix <- str_c("_", 1:2)

map_dfc(.x = suffix,
    .f = ~ transmute(df, !!str_c("q2", .x) := rowSums(across(ends_with(.x
    )), na.rm = T)))
#>   q2_1 q2_2
#> 1    1    1
#> 2    2    2
#> 3    1    1
#> 4    2    2
#> 5    1    1
#> 6    2    2
#> 7    1    1
#> 8    2    2

^{Created on 2022-04-04 by the reprex package (v2.0.1)}

Aron Strandberg · Accepted Answer · 2022-04-04T09:57:35.210

1

Here's one way, using coalesce:

df %>%
  mutate(q2_1 = do.call(coalesce, across(ends_with('_1'))),
         q2_2 = do.call(coalesce, across(ends_with('_2')))) %>%
  select(q2_1, q2_2)

#>   q2_1 q2_2
#> 1    1    1
#> 2    2    2
#> 3    1    1
#> 4    2    2
#> 5    1    1
#> 6    2    2
#> 7    1    1
#> 8    2    2

edited Apr 04 '22 at 09:57

answered Apr 04 '22 at 09:52

Aron Strandberg

3,040
9
15

1

Much easier than my solution! I did not think of using coalesce that way – Maël Apr 04 '22 at 09:54
Thank you, this is a very intuitive answer! Do you have some advice on how to use or transform this code when the dataset contains (many!) other questions ending with _1, _2 and so on, that I don't want to include? I know this is not mentioned in my original question, but maybe you could give me some pointers? :) – Demi Apr 04 '22 at 10:05
Great! Check the selection functions in the tidyselect package and see if there's something that works better than ends_with() for you. Worst case, select them all explicitly: mutate(q2_1 = do.call(coalesce, across(matches('q2_1|q3_1|q4_1|q5_1|q6_1')))) – Aron Strandberg Apr 04 '22 at 10:57

Merge survey columns across variables in R

2 Answers2