How to use melt() in r for my situation?

Question

I have been trying to use melt() function in r to shape my dataframe.

Here is the original,

     group StartX StartY   EndX   EndY id
  18878713 524897 180779 525352 179484  1
  18884056 531199 183111 532538 182503  2

I want to shape it to this,

     group Variable  Value id
  18878713   524897 180779  1
  18884056   531199 183111  2
  18878713   525352 179484  1
  18884056   532538 182503  2

Any idea how I can do this? Thanks!

`melt` from `data.table v1.9.8` (next release is 1.9.6, shortly, and the one after that) will be able to melt multiple columns. If you want to give it a try, you can use `devtools` to install from the 1.9.8 branch. — Arun, Nov 15 '14 at 19:15
@Arun, I'm waiting for that... Until then, [there's `merged.stack`](http://stackoverflow.com/a/26954394/1270695) for the impatient :-) — A5C1D2H2I1M1N2O1R2T1, Nov 17 '14 at 06:41

jlhoward · Accepted Answer · 2014-11-15T19:54:46.517

Here's another base R solution using reshape(...).

result <- reshape(df,idvar=c(1,6),times=c("Start","End"),v.names=c("X","Y"),
                  varying=list(c(2,4),c(3,5)),direction="long")
result
#                    group id  time      X      Y
# 18878713.1.Start 18878713  1 Start 524897 180779
# 18884056.2.Start 18884056  2 Start 531199 183111
# 18878713.1.End   18878713  1   End 525352 179484
# 18884056.2.End   18884056  2   End 532538 182503

You can get rid of the time column with

result$time <- NULL

If you insist on using melt(...), which is not really designed for this type of problem, here is one way.

library(reshape2)
m.1 <- melt(df,id=c(1,6),measure=c(2,4),value.name="X")
m.2 <- melt(df,id=c(1,6),measure=c(3,5),value.name="Y")
result <- data.frame(m.1[,-3],Y=m.2$Y)
result
#      group id      X      Y
# 1 18878713  1 524897 180779
# 2 18884056  2 531199 183111
# 3 18878713  1 525352 179484
# 4 18884056  2 532538 182503

A5C1D2H2I1M1N2O1R2T1 · Answer 2 · 2014-11-17T07:45:23.537

Here's an approach using merged.stack from my "splitstackshape" package:

library(splitstackshape)
merged.stack(mydf, var.stubs = c("X", "Y"), sep = "var.stubs", atStart = FALSE)
#       group id .time_1      X      Y
# 1: 18878713  1     End 525352 179484
# 2: 18878713  1   Start 524897 180779
# 3: 18884056  2     End 532538 182503
# 4: 18884056  2   Start 531199 183111

Generally, it is expected that names are provided in the form of "variable stub" + "sep" + "time value" (for instance, "X.Start", "X.End" and so on). In cases where there is no "sep" you can also specify the "sep" to be a regex of the "var.stubs". The atStart argument specifies whether to look for the variable stub at the start or at the end of the variable name.

In this example, you could also specify sep = "X$|Y$", indicating to look for an "X" or a "Y" at the end of the variable names and group those together. In this case, you would not use the atSart argument.

If you don't want that ".time_1" column, you can use compound statements with merged.stack, but note that by removing it, there would be a loss of information in your reshaped data:

merged.stack(mydf, var.stubs = c("X", "Y"), sep = "X$|Y$")[, .time_1 := NULL][]
#       group id      X      Y
# 1: 18878713  1 525352 179484
# 2: 18878713  1 524897 180779
# 3: 18884056  2 532538 182503
# 4: 18884056  2 531199 183111

Update: System timings

merged.stack is also designed to be pretty fast. Here's a comparison with base R's reshape on this dataset replicated to be 1 million rows.

## make the dataset 1 million rows
mydf <- do.call(rbind, replicate(500, mydf, FALSE))  ## 1K
mydf <- do.call(rbind, replicate(1000, mydf, FALSE)) ## 1M
mydf$id <- 1:nrow(mydf)                              ## Row-wise id

funMS <- function() merged.stack(mydf, var.stubs = c("X", "Y"), sep = "X$|Y$")

funR <- function() {
  reshape(mydf, idvar = c(1, 6),
          times = c("Start", "End"),
          v.names = c("X", "Y"),
          varying = list(c(2, 4), c(3, 5)),
          direction = "long")
}

system.time(funR())
#    user  system elapsed 
#  23.315   0.000  23.224 
system.time(funMS())
#    user  system elapsed 
#   2.173   0.000   2.207

Ananda, nice! I'm guessing the new feature could be integrated nicely with `merged.stack`? — Arun, Nov 17 '14 at 07:02

Arun · Answer 3 · 2015-06-15T21:39:14.053

With the new feature to melt multiple columns implemented in melt.data.table recently in devel version v1.9.5, you can do this:

require(data.table) ## 1.9.5
melt(dt, measure = patterns("^Start", "^End"))[, variable := NULL][]
#       group id value1 value2
# 1: 18878713  1 524897 180779
# 2: 18884056  2 531199 183111
# 3: 18878713  1 525352 179484
# 4: 18884056  2 532538 182503

You can install the development version by following these instructions.

How to use melt() in r for my situation?

3 Answers3

Update: System timings