Converting wide table to long programmatically

Question

My example dataset is wide and contains these values:

    olddata_wide <- read.table(header=TRUE, text='
 subject sex a b c a1 b1 c1  a2 b2 c2 
       1   M     7.9  12.3  10.7 7.5  12.1  10.3 8.1  12.5  10.9
       2   F     6.3  10.6  11.1 6.0  10.4  11.0 6.5  10.9  11.4
       3   F     9.5  13.1  13.8 9.3  13.0  13.5 9.8  13.5  13.9
       4   M    11.5  13.4  12.9 11.2  13.5  12.7 11.7  13.6  13.9
')

I would like to convert it to a long dataset. The problem is that I want to use multiple keycols at the same time - I want the columns a & b & c to become one long column called value, as well as columns a1 & b1 & c1 to value1 and a2 & b2 & c2 to value3. So the desired outcome is:

    subject sex value valueType value1 valueType1 value2 valueType2
 1:       1   M   7.9         a    7.5         a1    8.1         a2
 2:       2   F   6.3         a    6.0         a1    6.5         a2
 3:       3   F   9.5         a    9.3         a1    9.8         a2
 4:       4   M  11.5         a   11.2         a1   11.7         a2
 5:       1   M  12.3         b   12.1         b1   12.5         b2
 6:       2   F  10.6         b   10.4         b1   10.9         b2
 7:       3   F  13.1         b   13.0         b1   13.5         b2
 8:       4   M  13.4         b   13.5         b1   13.6         b2
 9:       1   M  10.7         c   10.3         c1   10.9         c2
10:       2   F  11.1         c   11.0         c1   11.4         c2
11:       3   F  13.8         c   13.5         c1   13.9         c2
12:       4   M  12.9         c   12.7         c1   13.9         c2

I know how to get the desired outcome programmatically for one key column:

keycol <- "valueType"
valuecol <- "value"
gathercols <- c("a", "b", "c")

gather_(olddata_wide, keycol, valuecol, gathercols)

But how can I do this for multiple keycols at the same time?

score 2 · Accepted Answer · answered Sep 29 '22 at 15:54

A data.table idea:

olddata_wide[, melt(
                 .SD, 
                 id.vars = c("subject", "sex"), 
                 measure.vars = patterns(valueType = "[a-c]$", valueType1 = '1$', valueType2 = '2$'))
             ][, variable := letters[variable]][]

#     subject    sex variable valueType valueType1 valueType2
#       <int> <char>   <char>     <num>      <num>      <num>
#  1:       1      M        a       7.9        7.5        8.1
#  2:       2      F        a       6.3        6.0        6.5
#  3:       3      F        a       9.5        9.3        9.8
#  4:       4      M        a      11.5       11.2       11.7
#  5:       1      M        b      12.3       12.1       12.5
#  6:       2      F        b      10.6       10.4       10.9
#  7:       3      F        b      13.1       13.0       13.5
#  8:       4      M        b      13.4       13.5       13.6
#  9:       1      M        c      10.7       10.3       10.9
# 10:       2      F        c      11.1       11.0       11.4
# 11:       3      F        c      13.8       13.5       13.9
# 12:       4      M        c      12.9       12.7       13.9

could do explain what the last part does ? [, variable := letters[variable]][] I am actually using different column names on my real data, but I end up having a variable column with the contents a, b, and c — Nneka, Sep 30 '22 at 13:31
Ok what is the patterns you use for `valueType`? You could maybe reuse it like this: `[, variable := grep("[a-c]$", names(olddata_wide), value = TRUE)[variable]][]` — s_baldur, Sep 30 '22 at 14:20

an_ja · Answer 2 · 2022-09-29T14:12:22.810

0

There are different solutions.

If your columns have names that don't follow a certain pattern, then I'd go with the sjmisc package.

    sjmisc::reshape_longer(
    olddata_wide ,
    columns = list(
    c("a", "b", "c"),
    c("a1", "b1", "c1"),
    c("a2", "b2", "c2")),
    values.to = c("value", "value1", "value2"))

You could also use data.table.

melt(setDT(olddata_wide), 
     measure = patterns("^[^0-9]$","[abc]+1", "[abc]+2"),
     variable.name = c("id"),
     value.name = c("value","value1", "value2"))

edited Sep 29 '22 at 14:12

answered Sep 29 '22 at 13:54

an_ja

427
2
13

unfortunately I am not able to install that package (company policy), what's the other option? – Nneka Sep 29 '22 at 13:56
I edited my answer with a data.table solution – an_ja Sep 29 '22 at 14:13

langtang · Answer 3 · 2022-09-29T15:07:51.107

Here is one approach using melt() and dcast() from the data.table package.

setDT(olddata_wide)
setnames(olddata_wide, old=c("a","b","c"),  new=c("a0", "b0", "c0"))

df_long = dcast(
  melt(olddata_wide,c("subject","sex"),variable.name = "v")[,(c("v", "t")):=tstrsplit(v,"")],
  subject+sex+v~t, value.var="value",
)

df_long[, .(subject, sex, value=`0`,valueType=v, value1=`1`,valueType1 = paste0(v,"1"), value2=`2`,valueType2 = paste0(v,"2"))]

Output:

    subject sex value valueType value1 valueType1 value2 valueType2
 1:       1   M   7.9         a    7.5         a1    8.1         a2
 2:       1   M  12.3         b   12.1         b1   12.5         b2
 3:       1   M  10.7         c   10.3         c1   10.9         c2
 4:       2   F   6.3         a    6.0         a1    6.5         a2
 5:       2   F  10.6         b   10.4         b1   10.9         b2
 6:       2   F  11.1         c   11.0         c1   11.4         c2
 7:       3   F   9.5         a    9.3         a1    9.8         a2
 8:       3   F  13.1         b   13.0         b1   13.5         b2
 9:       3   F  13.8         c   13.5         c1   13.9         c2
10:       4   M  11.5         a   11.2         a1   11.7         a2
11:       4   M  13.4         b   13.5         b1   13.6         b2
12:       4   M  12.9         c   12.7         c1   13.9         c2

Here is another approach (also leverages data.table)

do.call(cbind,lapply(1:3, \(i) {
  res = melt(setDT(olddata_wide[,c(1:2,((i*3):(i*3+2)))]),c("subject","sex"),variable.name="valueType")
  if(i>1) res <- setNames(res[,-c(1,2)],paste0(names(res)[-c(1,2)],i-1))
  res
}))

Maurits Evers · Answer 4 · 2022-09-29T14:08:11.463

Here is a (fairly clunky) tidyverse approach:

olddata_wide %>%
    pivot_longer(matches("^[abc]"), names_to = "valueType") %>%
    mutate(suffix = str_remove(valueType, "^.")) %>%
    pivot_wider(
        names_from = "suffix", values_from = c("value", "valueType"), names_sep = "", values_fn = list) %>%
    unnest(matches("value"))
## A tibble: 12 × 8
#   subject sex   value value1 value2 valueType valueType1 valueType2
#     <int> <chr> <dbl>  <dbl>  <dbl> <chr>     <chr>      <chr>     
# 1       1 M       7.9    7.5    8.1 a         a1         a2        
# 2       1 M      12.3   12.1   12.5 b         b1         b2        
# 3       1 M      10.7   10.3   10.9 c         c1         c2        
# 4       2 F       6.3    6      6.5 a         a1         a2        
# 5       2 F      10.6   10.4   10.9 b         b1         b2        
# 6       2 F      11.1   11     11.4 c         c1         c2        
# 7       3 F       9.5    9.3    9.8 a         a1         a2        
# 8       3 F      13.1   13     13.5 b         b1         b2        
# 9       3 F      13.8   13.5   13.9 c         c1         c2        
#10       4 M      11.5   11.2   11.7 a         a1         a2        
#11       4 M      13.4   13.5   13.6 b         b1         b2        
#12       4 M      12.9   12.7   13.9 c         c1         c2

The general idea is to reshape all columns matching "^[abc]" from wide to long, and then rebuild into a wide format according to your expected output.

score 0 · Answer 5 · answered Sep 30 '22 at 19:23

I use a data.table approach similar to some of the others but break the process into smaller, discrete steps, allowing (I think) a easily modified procedure though with more lines of code.

The original data set converted to a data.table object.

library(data.table)
olddata_wide <- read.table(header = TRUE, text = '
 subject sex       a     b     c   a1    b1    c1   a2    b2    c2 
       1   M     7.9  12.3  10.7  7.5  12.1  10.3  8.1  12.5  10.9
       2   F     6.3  10.6  11.1  6.0  10.4  11.0  6.5  10.9  11.4
       3   F     9.5  13.1  13.8  9.3  13.0  13.5  9.8  13.5  13.9
       4   M    11.5  13.4  12.9 11.2  13.5  12.7 11.7  13.6  13.9
')
setDT(olddata_wide)

Transform to block-record form with all numerical values in one column.

DT <- melt(olddata_wide, 
           id.vars = c("subject", "sex"), 
           variable.name = "type", 
           value.name = "value", 
           variable.factor = FALSE)
DT
#>     subject    sex   type value
#>       <int> <char> <char> <num>
#>  1:       1      M      a   7.9
#>  2:       2      F      a   6.3
#>  3:       3      F      a   9.5
#>  4:       4      M      a  11.5
#>  5:       1      M      b  12.3
#>  6:       2      F      b  10.6
#> ---                            
#> 31:       3      F     b2  13.5
#> 32:       4      M     b2  13.6
#> 33:       1      M     c2  10.9
#> 34:       2      F     c2  11.4
#> 35:       3      F     c2  13.9
#> 36:       4      M     c2  13.9

Assign valueType corresponding to the subscripts on type (i.e., a, a1, a2, etc.)

DT[, valueType := fcase(
  type %ilike% "1", "value1", 
  type %ilike% "2", "value2", 
  default = "value0"
)]
DT
#>     subject    sex   type value valueType
#>       <int> <char> <char> <num>    <char>
#>  1:       1      M      a   7.9    value0
#>  2:       2      F      a   6.3    value0
#>  3:       3      F      a   9.5    value0
#>  4:       4      M      a  11.5    value0
#>  5:       1      M      b  12.3    value0
#>  6:       2      F      b  10.6    value0
#> ---                                      
#> 31:       3      F     b2  13.5    value2
#> 32:       4      M     b2  13.6    value2
#> 33:       1      M     c2  10.9    value2
#> 34:       2      F     c2  11.4    value2
#> 35:       3      F     c2  13.9    value2
#> 36:       4      M     c2  13.9    value2

Delete the subscripts from a1, a2, b1, etc. to facilitate the next transformation.

DT[, type := substr(type, 1, 1)]
DT
#>     subject    sex   type value valueType
#>       <int> <char> <char> <num>    <char>
#>  1:       1      M      a   7.9    value0
#>  2:       2      F      a   6.3    value0
#>  3:       3      F      a   9.5    value0
#>  4:       4      M      a  11.5    value0
#>  5:       1      M      b  12.3    value0
#>  6:       2      F      b  10.6    value0
#> ---                                      
#> 31:       3      F      b  13.5    value2
#> 32:       4      M      b  13.6    value2
#> 33:       1      M      c  10.9    value2
#> 34:       2      F      c  11.4    value2
#> 35:       3      F      c  13.9    value2
#> 36:       4      M      c  13.9    value2

Transform to the desired form.

DT <- dcast(DT, subject + sex + type ~ valueType, value.var = "value")
setorderv(DT, c("type", "subject"))
DT
#>     subject    sex   type value0 value1 value2
#>       <int> <char> <char>  <num>  <num>  <num>
#>  1:       1      M      a    7.9    7.5    8.1
#>  2:       2      F      a    6.3    6.0    6.5
#>  3:       3      F      a    9.5    9.3    9.8
#>  4:       4      M      a   11.5   11.2   11.7
#>  5:       1      M      b   12.3   12.1   12.5
#>  6:       2      F      b   10.6   10.4   10.9
#>  7:       3      F      b   13.1   13.0   13.5
#>  8:       4      M      b   13.4   13.5   13.6
#>  9:       1      M      c   10.7   10.3   10.9
#> 10:       2      F      c   11.1   11.0   11.4
#> 11:       3      F      c   13.8   13.5   13.9
#> 12:       4      M      c   12.9   12.7   13.9

Converting wide table to long programmatically

5 Answers5