Here's an approach using merged.stack
from my "splitstackshape" package:
library(splitstackshape)
merged.stack(mydf, var.stubs = c("X", "Y"), sep = "var.stubs", atStart = FALSE)
# group id .time_1 X Y
# 1: 18878713 1 End 525352 179484
# 2: 18878713 1 Start 524897 180779
# 3: 18884056 2 End 532538 182503
# 4: 18884056 2 Start 531199 183111
Generally, it is expected that names are provided in the form of "variable stub" + "sep" + "time value" (for instance, "X.Start", "X.End" and so on). In cases where there is no "sep" you can also specify the "sep" to be a regex of the "var.stubs". The atStart
argument specifies whether to look for the variable stub at the start or at the end of the variable name.
In this example, you could also specify sep = "X$|Y$"
, indicating to look for an "X" or a "Y" at the end of the variable names and group those together. In this case, you would not use the atSart
argument.
If you don't want that ".time_1" column, you can use compound statements with merged.stack
, but note that by removing it, there would be a loss of information in your reshaped data:
merged.stack(mydf, var.stubs = c("X", "Y"), sep = "X$|Y$")[, .time_1 := NULL][]
# group id X Y
# 1: 18878713 1 525352 179484
# 2: 18878713 1 524897 180779
# 3: 18884056 2 532538 182503
# 4: 18884056 2 531199 183111
Update: System timings
merged.stack
is also designed to be pretty fast. Here's a comparison with base R's reshape
on this dataset replicated to be 1 million rows.
## make the dataset 1 million rows
mydf <- do.call(rbind, replicate(500, mydf, FALSE)) ## 1K
mydf <- do.call(rbind, replicate(1000, mydf, FALSE)) ## 1M
mydf$id <- 1:nrow(mydf) ## Row-wise id
funMS <- function() merged.stack(mydf, var.stubs = c("X", "Y"), sep = "X$|Y$")
funR <- function() {
reshape(mydf, idvar = c(1, 6),
times = c("Start", "End"),
v.names = c("X", "Y"),
varying = list(c(2, 4), c(3, 5)),
direction = "long")
}
system.time(funR())
# user system elapsed
# 23.315 0.000 23.224
system.time(funMS())
# user system elapsed
# 2.173 0.000 2.207