0

I have some data similar to this example data, where columns stand for years:

data <- data.frame(
  X2020_1 = c("A", NA, "B"),
  X2020_2 = c("A", "C", NA),
  X2021_1 = c("A", NA, "C"),
  X2021_2 = c(NA, NA, "A")
)

which looks like this:

  X2020_1 X2020_2 X2021_1 X2021_2
1       A       A       A    <NA>
2    <NA>       C    <NA>    <NA>
3       B    <NA>       C       A

I want to group the data by columns based on the column name. I want anything with X2020 in its name to be in a new X2020 column, and anything with X2021 to be in a new X2021 column. I also want to eliminate NA values unless there are only NAs in that column-group.

My desired end result data should look like this:

    X2020   X2021
1 c(A, A)       A
2       C    <NA>
3       B c(C, A)

I'm not really sure how to get there.

ifoxfoot
  • 195
  • 7

1 Answers1

2

I have two solutions, the first one is more manual and the second one is more dynamic.

unite()

From the tidyr package, there's an unite() function that allows you to combine multiple columns into a single one. But that requires one unite function per pattern (or year in this case).

library(tidyverse)

data %>% 
  unite("X2020", starts_with("X2020"), na.rm = T, sep = ", ") %>% 
  unite("X2021", starts_with("X2021"), na.rm = T, sep = ", ") %>% 
  mutate(across(everything(), ~ifelse(.x == "", NA, .x)))

Reshape data

For the second solution, you can first reshape your data into a "long" format, then summarise the column per row number and "year" to concatenate the values, and reshape back to "wide" format.

This is more dynamic and do not require hard-coding patterns.

library(tidyverse)

data %>% 
  mutate(rn = row_number()) %>% 
  pivot_longer(-rn) %>% 
  mutate(name = sub("_\\d", "", name)) %>% 
  group_by(name, rn) %>% 
  summarize(value = toString(na.omit(value))) %>%
  pivot_wider() %>% 
  select(-rn) %>% 
  mutate(across(everything(), ~ifelse(.x == "", NA, .x)))

Output

Both solutions give the same output.

# A tibble: 3 × 2
  X2020 X2021
  <chr> <chr>
1 A, A  A    
2 C     NA   
3 B     C, A 
benson23
  • 16,369
  • 9
  • 19
  • 38