1

I am struggling with R reshape function all the time and still would like to use it consistently as it is R-core. I stumbled onto this problem having wide dataframe like this

dw <- read.table(header=T, text='
 sbj A.f1.avg A.f1.sd A.f2.avg A.f2.sd B.f1.avg B.f1.sd B.f2.avg B.f2.sd
   A       10       6       50      10       10       6       50      10  
   B       12       5       70      11       12       5       70      11  
   C       20       7       20      8        20       7       20      8   
   D       22       8       22      9        22       8       22      9   
 ')    

and I wish for this outcome:

sbj AB  f   var res
A   A   f1  avg 10
B   A   f1  avg 12
C   A   f1  avg 20
D   A   f1  avg 22
A   A   f1  sd  6
B   A   f1  sd  5
C   A   f1  sd  7
D   A   f1  sd  8
A   A   f2  avg 50
B   A   f2  avg 70
C   A   f2  avg 20
D   A   f2  avg 22
A   A   f2  sd  10
B   A   f2  sd  11
C   A   f2  sd  8
D   A   f2  sd  9
A   B   f1  avg 10
B   B   f1  avg 12
C   B   f1  avg 20
D   B   f1  avg 22
A   B   f1  sd  6
B   B   f1  sd  5
C   B   f1  sd  7
D   B   f1  sd  8
A   B   f2  avg 50
B   B   f2  avg 70
C   B   f2  avg 20
D   B   f2  avg 22
A   B   f2  sd  10
B   B   f2  sd  11
C   B   f2  sd  8
D   B   f2  sd  9

Please, help!

striatum
  • 1,428
  • 3
  • 14
  • 31

3 Answers3

2

With dplyr and tidyr:

library(dplyr)
library(tidyr)

X <- gather(dw, var, res, -sbj) %>%
    separate(var, into = c("AB", "f", "var"))

Result:

> head(X)
  sbj AB  f var res
1   A  A f1 avg  10
2   B  A f1 avg  12
3   C  A f1 avg  20
4   D  A f1 avg  22
5   A  A f1  sd   6
6   B  A f1  sd   5
ulfelder
  • 5,305
  • 1
  • 22
  • 40
2

Using reshape2...

library(reshape2)
dw2 <- melt(dw, id.vars="sbj", value.name="res") #melt to long format

#create new variables by splitting column at dots
dw2[, c("AB", "f", "var")] <- t(as.data.frame((strsplit(as.character(dw2$variable),"\\."))))

#reorder variables
dw2 <- dw2[,c("sbj", "AB", "f", "var", "res")]

dw2
   sbj AB  f var   res
1    A  A f1 avg    10
2    B  A f1 avg    12
3    C  A f1 avg    20
4    D  A f1 avg    22
5    A  A f1  sd     6
6    B  A f1  sd     5
7    C  A f1  sd     7
8    D  A f1  sd     8
9    A  A f2 avg    50
10   B  A f2 avg    70
11   C  A f2 avg    20
12   D  A f2 avg    22
13   A  A f2  sd    10
14   B  A f2  sd    11
15   C  A f2  sd     8
16   D  A f2  sd     9
17   A  B f1 avg    10
18   B  B f1 avg    12
19   C  B f1 avg    20
20   D  B f1 avg    22
21   A  B f1  sd     6
22   B  B f1  sd     5
23   C  B f1  sd     7
24   D  B f1  sd     8
25   A  B f2 avg    50
26   B  B f2 avg    70
27   C  B f2 avg    20
28   D  B f2 avg    22
29   A  B f2  sd    10
30   B  B f2  sd    11
31   C  B f2  sd     8
32   D  B f2  sd     9
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
1

Here is a base R method that uses stack and cbind for the initial "reshape long" and then performs a second step to decompose the variable names of dw into separate columns.

# get a 3 column df
dat <- cbind(dw[1], stack(dw[-1]))

# split the variable names, reorder columns, and provide desired names
dat <- setNames(cbind(dat[1],
                      do.call(rbind, strsplit(as.character(dat$ind), ".", fixed=TRUE)),
                      dat[2]),
                 c("sbj", "AB", "f", "var", "res"))
lmo
  • 37,904
  • 9
  • 56
  • 69