6

Is there any way in R to Convert columns to rows keeping the name of the column?

Example input:

A   B
1   1
2   3
3   4
44  5

Output

 Group Number
  A       1
  A       2
  A       3
  A       44 
  B       1
  B       3
  B       4
  B       5
user1981275
  • 13,002
  • 8
  • 72
  • 101

2 Answers2

11

No need to use reshape2, you can use the stack function from base-R :

With your.data as your example:

res <- stack(your.data)
colnames(res) = c("Number", "Group")

gives you

> res
  Number Group
1      1     A
2      2     A
3      3     A
4     44     A
5      1     B
6      3     B
7      4     B
8      5     B

See also here.


Benchmarking melt from reshape2 and stack from base on bigger data:

require(reshape2)
set.seed(45)
DF <- data.frame(matrix(sample(20, 1e6, TRUE), ncol=100))

require(microbenchmark)
microbenchmark(stack(DF), melt(DF), times=100)

Unit: milliseconds
      expr      min       lq   median       uq      max neval
 stack(DF) 157.7084 187.1993 241.8206 251.7132 334.5488   100
  melt(DF) 174.6079 253.1088 261.6234 273.3971 443.9953   100

Seems like stack is faster, but by a margin of 20 milliseconds...

Community
  • 1
  • 1
user1981275
  • 13,002
  • 8
  • 72
  • 101
  • @Thomas, it'd be also useful to edit the benchmarking in the post (preferably with larger data). – Arun Jul 30 '13 at 13:56
  • @Thomas, doesn't seem twice as fast... at least with this data size. – Arun Jul 30 '13 at 14:05
  • If you mean by doing "system.time", then yes, it's very likely to be inconsistent. – Arun Jul 30 '13 at 14:06
  • I just ran 5 times, on new session each... The median time differs, but the difference between the two are in the range of 20 and 40 ms. – Arun Jul 30 '13 at 14:08
  • @Arun Thanks for adding the benchmarking, I see about 50 ms difference. – user1981275 Jul 30 '13 at 14:17
  • @RomanLuštrik Very interesting. Perhaps you should edit both of those tests (or the link to them) into this post. Then we can clean up these comments. – Thomas Jul 30 '13 at 14:32
9

I use reshape2.

> x <- data.frame(A = 1:5, B = 55:51)
> library(reshape2)
> melt(x)
Using  as id variables
   variable value
1         A     1
2         A     2
3         A     3
4         A     4
5         A     5
6         B    55
7         B    54
8         B    53
9         B    52
10        B    51

It was interesting to see the benchmarks. melt prints a message by default that we can turn off by being more explicit when calling a function.

> microbenchmark(stack(DF), melt(DF), times=100)
    Unit: milliseconds
          expr      min       lq   median       uq      max neval
     stack(DF) 122.3086 133.8435 139.6990 180.5338 250.9316   100
      melt(DF) 140.0183 198.2025 227.8125 245.3444 367.1552   100

I find the difference small, and it gets smaller when printing for melt is turned off. Looks like that my hunch of turning verbose mode off in my simulations may have helped.

> microbenchmark(stack(DF), melt(DF, measure.vars = names(DF)[grepl("X", names(DF))]), times=100)
Unit: milliseconds
                                                      expr      min       lq   median       uq      max neval
                                                 stack(DF) 94.33681 124.9958 132.1747 144.7323 287.7438   100
 melt(DF, measure.vars = names(DF)[grepl("X", names(DF))]) 99.44282 141.0594 150.2625 178.8081 249.0888   100
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197