2

I struggle to find a more programmatic way to split multiple columns based on the same delimiter...

The solution should work for n columns, all to be identified by a similar pattern, in my example "^var[0-9]"

E.g.

library(tidyverse)
foo <- data.frame(var1 = paste0("a_",1:10), var2 = paste("a_",1:10), id = 1:10)

# Desired output

foo %>% 
  separate(var1, into = c("group1", "index1")) %>%
  separate(var2, into = c("group2", "index2"))
#>    group1 index1 group2 index2 id
#> 1       a      1      a      1  1
#> 2       a      2      a      2  2
#> 3       a      3      a      3  3
#> 4       a      4      a      4  4
#> 5       a      5      a      5  5
#> 6       a      6      a      6  6
#> 7       a      7      a      7  7
#> 8       a      8      a      8  8
#> 9       a      9      a      9  9
#> 10      a     10      a     10 10
tjebo
  • 21,977
  • 7
  • 58
  • 94

1 Answers1

1

This isn't very elegant, but it works (based on answers here: Tidy method to split multiple columns using tidyr::separate):

grep("var[0-9]", names(foo), value = TRUE) %>% 
  map_dfc(~ foo %>% 
            select(.x) %>% 
            separate(.x, 
                     into = paste0(c("group", "index"), gsub("[^0-9]", "", .x)))
  ) %>% 
  bind_cols(id = foo$id)

Returns:

   group1 index1 group2 index2 id
1       a      1      a      1  1
2       a      2      a      2  2
3       a      3      a      3  3
4       a      4      a      4  4
5       a      5      a      5  5
6       a      6      a      6  6
7       a      7      a      7  7
8       a      8      a      8  8
9       a      9      a      9  9
10      a     10      a     10 10
semaphorism
  • 836
  • 3
  • 13
  • thanks! Also for finding the dupe :) Have flagged it and closed – tjebo Oct 28 '20 at 21:20
  • The previous answer was almost but not quite a dupe: it didn't have to consider matching columns by regex, which has been implemented in answer above. – semaphorism Oct 28 '20 at 21:53
  • marking questions as a dupe does not necessarily need to mean they are exactly the same - but they will be linked in a very obvious way, which makes it much easier to find related questions. Therefore I marked it as a dupe :) – tjebo Oct 29 '20 at 08:43
  • Also for future readers - contrary to what the poster said, I personally find this a fairly elegant solution to the posted problem. However, I noticed right after asking the question that for my subsequent analysis, it would be a much better approach to reshape the columns into long format. So if you have a similar problem - something to consider! – tjebo Oct 29 '20 at 08:48