Retaining column information when melting multiple columns into one

Question

I have a data.table which I have melted follows:

library(data.table)
DT <- fread(
"ID country year Event_A Event_B
4   NLD   2002  0   1
5   NLD   2002  0   1
6   NLD   2006  1   1
7   NLD   2006  1   0
8   NLD   2006  1   1
9   GBR   2002  0   1
10  GBR   2002  0   0
11  GBR   2002  0   1
12  GBR   2006  1   1
13  GBR   2006  1   1",
header = TRUE)

melt(DT, id.var = setdiff(names(DT), c("Event_A", "Event_B")), 
          value.name = 'Event')[, variable := NULL][order(ID)]
#     ID country year Event
# 1:  4     NLD 2002     0
# 2:  4     NLD 2002     1
# 3:  5     NLD 2002     0
# 4:  5     NLD 2002     1
# 5:  6     NLD 2006     1
# 6:  6     NLD 2006     1
# 7:  7     NLD 2006     1
# 8:  7     NLD 2006     0
# 9:  8     NLD 2006     1
#10:  8     NLD 2006     1
#11:  9     GBR 2002     0
#12:  9     GBR 2002     1
#13: 10     GBR 2002     0
#14: 10     GBR 2002     0
#15: 11     GBR 2002     0
#16: 11     GBR 2002     1
#17: 12     GBR 2006     1
#18: 12     GBR 2006     1
#19: 13     GBR 2006     1
#20: 13     GBR 2006     1

However, in hindsight, I want to have the Event category in the melted data set. How do I make sure that this information is retained in the melted data?

EDIT (Due to oversimplification in original post):

DT <- fread(
"ID country year Event_A Event_B Choice_A Choice_B
4   NLD   2002  0   1  0   1
5   NLD   2002  0   1  1   1
6   NLD   2006  1   1  0   1
7   NLD   2006  1   0  1   1
8   NLD   2006  1   1  1   1
9   GBR   2002  0   1  1   0
10  GBR   2002  0   0  1   1
11  GBR   2002  0   1  0   1
12  GBR   2006  1   1  1   1
13  GBR   2006  1   1  0   0",
header = TRUE)

DT<- melt(DT, measure = patterns("^Event_", "^Choice_"), 
     value.name =  c("Event", "Choice"))[, variable :=  NULL][order(ID)]

Desired output:

#     ID country year Event Event_Cat Choice Choice_Cat
# 1:  4     NLD 2002     0  A         0      A
# 2:  4     NLD 2002     1  B         1      B
# 3:  5     NLD 2002     0  A
# 4:  5     NLD 2002     1  B
# 5:  6     NLD 2006     1  A
# 6:  6     NLD 2006     1  B
# 7:  7     NLD 2006     1
# 8:  7     NLD 2006     0
# 9:  8     NLD 2006     1
#10:  8     NLD 2006     1
#11:  9     GBR 2002     0
#12:  9     GBR 2002     1
#13: 10     GBR 2002     0
#14: 10     GBR 2002     0
#15: 11     GBR 2002     0
#16: 11     GBR 2002     1
#17: 12     GBR 2006     1
#18: 12     GBR 2006     1
#19: 13     GBR 2006     1
#20: 13     GBR 2006     1

Please ensure you provide a minimal working example so that extended discussions are prevented. — Edward, Mar 31 '20 at 09:37

Edward · Accepted Answer · 2020-03-31T09:58:32.780

Don't NULLify the variable.name:

setnames(
   melt(DT, id.var = setdiff(names(DT), c("Event_A", "Event_B")), value.name = 'Event')[
     , variable:=sub("Event_", "", variable)][order(ID)], 
   old="variable", new="Event_Cat")

    ID country year Event_Cat Event
 1:  4     NLD 2002         A     0
 2:  4     NLD 2002         B     1
 3:  5     NLD 2002         A     0
 4:  5     NLD 2002         B     1
 5:  6     NLD 2006         A     1
 6:  6     NLD 2006         B     1 ...

Edit, based on new information provided (melting multiple columns).

DT2 <- setnames(
  melt(DT, measure = patterns("^Event_", "^Choice_"), 
         value.name =  c("Event", "Choice"))[, variable := forcats::lvls_revalue(variable, 
            c("A", "B"))][order(ID)],
  old="variable", new="Cetegory")
DT2

    ID country year Cetegory Event Choice
 1:  4     NLD 2002        A     0      0
 2:  4     NLD 2002        B     1      1
 3:  5     NLD 2002        A     0      1
 4:  5     NLD 2002        B     1      1
 5:  6     NLD 2006        A     1      0
 6:  6     NLD 2006        B     1      1 ...

Thank you very much @Edward ! I have one question, I made my example a little bit to simple. I am actually using `measure = patterns("^Event_", "^Choice_")` instead of `c("Event_A", "Event_B")`. I have tried to adapt your code but I am not completely sure about the end.. `setnames( melt(DT, id.var = setdiff(names(DT), measure = patterns("^Event_", "^Choice_")), value.name = 'Event')[ , variable:=sub("Event_", "", variable)][order(ID)], old="variable", new="Event_Cat")` — Tom, Mar 31 '20 at 07:48
Thanks for your comment Edward, but I'm not sure I follow haha.. — Tom, Mar 31 '20 at 09:43

score 1 · Answer 2 · answered Mar 31 '20 at 07:31

1

You could use pivot_longer from tidyr :

tidyr::pivot_longer(DT, cols = starts_with('Event'), 
                    names_to = c('.value', 'Event_Cat'), 
                    names_sep = '_')


#     ID country  year Event_Cat Event
#   <int> <chr>   <int> <chr>     <int>
# 1     4 NLD      2002 A             0
# 2     4 NLD      2002 B             1
# 3     5 NLD      2002 A             0
# 4     5 NLD      2002 B             1
# 5     6 NLD      2006 A             1
# 6     6 NLD      2006 B             1
# 7     7 NLD      2006 A             1
# 8     7 NLD      2006 B             0
# 9     8 NLD      2006 A             1
#10     8 NLD      2006 B             1
#11     9 GBR      2002 A             0
#12     9 GBR      2002 B             1
#13    10 GBR      2002 A             0
#14    10 GBR      2002 B             0
#15    11 GBR      2002 A             0
#16    11 GBR      2002 B             1
#17    12 GBR      2006 A             1
#18    12 GBR      2006 B             1
#19    13 GBR      2006 A             1
#20    13 GBR      2006 B             1

answered Mar 31 '20 at 07:31

Ronak Shah

377,200
20
156
213

Thank you for your answer! I am using `measure = patterns("^Event_", "^Choice_")` in my actual data instead of `c("Event_A", "Event_B")`. What would be the equivalent of that for `starts_with`? I tried `starts_with(c('ECOST', 'Choice')` but that did not work. – Tom Mar 31 '20 at 07:42
1

@Tom use `matches` instead of `starts_with`, `matches('^Event|^Choice')` – Ronak Shah Mar 31 '20 at 07:46
Thanks, this regretfully throws some errors on the actual data. – Tom Mar 31 '20 at 08:08
@Tom Works for me on your updated data as well without any error. `tidyr::pivot_longer(DT, cols = matches('^Event|^Choice'), names_to = c('.value', 'Event_Cat'), names_sep = '_')` – Ronak Shah Mar 31 '20 at 08:09
I somehow get this warning: `Warning: Expected 2 pieces. Missing pieces filled with NA in 2 rows [1, 2]`, which makes `Event_Cat` NA. – Tom Mar 31 '20 at 08:11
I am not sure, I cannot reproduce the warning at my end. Also in your expected output, there is no point having `Event_Cat` and `Choice_Cat` as separate columns because they carry the exact same information. Other way would be to exclude thee columns that you don't want `tidyr::pivot_longer(DT, cols = -c(ID,country,year), names_to = c('.value', 'Event_Cat'), names_sep = '_')` – Ronak Shah Mar 31 '20 at 08:21
Yeah I figured. Thanks nonetheless! (Your other example gives the same warning) I'm hoping Edward will reply soon to see if I can get the DT solution to work. And yes you are right obviously. I just did not want to make the example possibly even more complicated. – Tom Mar 31 '20 at 08:24

Retaining column information when melting multiple columns into one

2 Answers2