0

I have a data.frame where multiple columns needs to be melted together based on the suffix of the column. So all columns ending with "from" should be melted into one column, same for columns ending on "to" and all columns without suffix together and here lies my problem as I cannot melt without appending the column names without suffix with "xxx". The question is how do I match an empty suffix with the regex formulation in names_pattern or is there a different solution without renaming? I am also interested in data.table solutions to this problem.

library(tibble)
library(magrittr)
library(tidyr)

data <-
  tibble::tribble(
  ~"abc", ~"abcfrom", ~"abcto", ~"def", ~"deffrom", ~"defto",
  1, "2019-05-16", NA, 0, NA, NA,
  1, "2020-01-01", "2020-10-15", 1, "2014-12-17", "2015-03-05",
  1, NA, NA, 1, "2015-01-01", NA
)

data %>% 
  dplyr::rename("abcxxx" = "abc", "defxxx" = "def") %>% 
  tidyr::pivot_longer(
    everything(),
    names_to = c("variable", ".value"),
    names_pattern = "(.+)(xxx|from|to)"
  )
# A tibble: 6 x 4
  variable   xxx from       to        
  <chr>    <dbl> <chr>      <chr>     
1 abc          1 2019-05-16 NA        
2 def          0 NA         NA        
3 abc          1 2020-01-01 2020-10-15
4 def          1 2014-12-17 2015-03-05
5 abc          1 NA         NA        
6 def          1 2015-01-01 NA
needRhelp
  • 2,948
  • 2
  • 24
  • 48

2 Answers2

0

Here is an option

library(dplyr)
library(tidyr)
library(stringr)
data %>% 
 rename_at(vars(names(.)[!str_detect(names(.), "(from|to)$")]),
        ~ str_c(., 'xxx')) %>% 
 tidyr::pivot_longer(
 everything(),
 names_to = c("variable", ".value"),
 names_pattern = "(.+)(xxx|from|to)"
)

-output

# A tibble: 6 x 4
#  variable   xxx from       to        
#  <chr>    <dbl> <chr>      <chr>     
#1 abc          1 2019-05-16 <NA>      
#2 def          0 <NA>       <NA>      
#3 abc          1 2020-01-01 2020-10-15
#4 def          1 2014-12-17 2015-03-05
#5 abc          1 <NA>       <NA>      
#6 def          1 2015-01-01 <NA>      
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Here is a data.table approach.

It does not really melt, but splits into chunks of (three) columns based on identical first three characters of the colname. Then set the colnames of the chunks, en rowbind them together again..

library( data.table )
setDT( data )

#assuming the first three characters define the group
data.split <- split.default(data, gsub('(^.{3}).*$', '\\1', names(data)))
# $abc
#    abc    abcfrom      abcto
# 1:   1 2019-05-16       <NA>
# 2:   1 2020-01-01 2020-10-15
# 3:   1       <NA>       <NA>
#   
# $def
#    def    deffrom      defto
# 1:   0       <NA>       <NA>
# 2:   1 2014-12-17 2015-03-05
# 3:   1 2015-01-01       <NA>

#set column names (fixed here, but can also be variable if desired)
data.split <- lapply( data.split, setnames, new = c("xxx", "from", "to") )

#bind together
DT <- rbindlist( data.split, use.names = TRUE, idcol = "variable" )
#    variable xxx       from         to
# 1:      abc   1 2019-05-16       <NA>
# 2:      abc   1 2020-01-01 2020-10-15
# 3:      abc   1       <NA>       <NA>
# 4:      def   0       <NA>       <NA>
# 5:      def   1 2014-12-17 2015-03-05
# 6:      def   1 2015-01-01       <NA>
Wimpel
  • 26,031
  • 1
  • 20
  • 37