1

I'm trying to do exactly what reshape from the stats package is designed for. I have a wide dataset with a series of variables in the form var_name.date. Unfortunately, reshape seems ill-equipped to deal with even medium-sized datasets, so I'm trying to use the the data.table.melt function.

My main problem is grouping the variables into separate value columns based on their long-form variable. Is this possible, or do I need to do each one separately and then cbind them?

Here is what I have:

widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
                       "B.2012-10"=runif(5),"B.2012-11"=runif(5))


   id  A.2012-10 A.2012-11  B.2012-10 B.2012-11
1:  1 0.82982349 0.2257782 0.46390924 0.4448248
2:  2 0.46136746 0.2184797 0.05640388 0.4772663
3:  3 0.61723234 0.3950625 0.03252784 0.4006974
4:  4 0.19963437 0.7028052 0.06811452 0.3096969
5:  5 0.09575389 0.5510507 0.76059610 0.8630222

And here is the the stats package's reshape mocking me with one-line awesomeness doing exactly what I want but not scaling.

reshape(widetable, idvar="id", varying=colnames(widetable)[2:5],
        sep=".", direction="long")


    id  time          A          B
 1:  1 2012-10 0.82982349 0.46390924
 2:  2 2012-10 0.46136746 0.05640388
 3:  3 2012-10 0.61723234 0.03252784
 4:  4 2012-10 0.19963437 0.06811452
 5:  5 2012-10 0.09575389 0.76059610
 6:  1 2012-11 0.22577823 0.44482478
 7:  2 2012-11 0.21847969 0.47726629
 8:  3 2012-11 0.39506249 0.40069737
 9:  4 2012-11 0.70280519 0.30969695
10:  5 2012-11 0.55105075 0.86302220
user3747260
  • 465
  • 1
  • 5
  • 14

1 Answers1

1

This is just one of those times when reshape() is more straightforward to use.

The most direct approach using a combination of melt and dcast.data.table that I can think of is as follows:

library(data.table)
library(reshape2)

longtable <- melt(widetable, id.vars = "id")
vars <- do.call(rbind, strsplit(as.character(longtable$variable), ".", TRUE))
dcast.data.table(longtable[, c("V1", "V2") := lapply(1:2, function(x) vars[, x])],
                 id + V2 ~ V1, value.var = "value")

An alternative is to use merged.stack from my "splitstackshape" package, specifically the development version.

# library(devtools)
# install_github("splitstackshape", "mrdwab", ref = "devel")
library(splitstackshape)

merged.stack(widetable, id.vars = "id", var.stubs = c("A", "B"), sep = "\\.")
#     id .time_1          A         B
#  1:  1 2012-10 0.26550866 0.2059746
#  2:  1 2012-11 0.89838968 0.4976992
#  3:  2 2012-10 0.37212390 0.1765568
#  4:  2 2012-11 0.94467527 0.7176185
#  5:  3 2012-10 0.57285336 0.6870228
#  6:  3 2012-11 0.66079779 0.9919061
#  7:  4 2012-10 0.90820779 0.3841037
#  8:  4 2012-11 0.62911404 0.3800352
#  9:  5 2012-10 0.20168193 0.7698414
# 10:  5 2012-11 0.06178627 0.7774452

The merged.stack function works differently from a simple melt because it starts by "stacking" different groups of columns in a list and then merging them together. This allows the function to:

  1. Work with column groups where each column group might be of a different type (character, numeric, and so on).
  2. Work with "unbalanced" column groups (where one group might have two measure columns and another might have three).

This answer is based on the following sample data:

set.seed(1) # Please use `set.seed()` when sharing an example with random numbers
widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
                       "B.2012-10"=runif(5),"B.2012-11"=runif(5))

See also: What reshaping problems can melt/cast not solve in a single step?

Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Thanks for the reply and for confirming that what I want cannot be done as simply using melt as it can be done using melt. I've been trying with your package using merge.stack but I think I have to resign to doing it piecemeal. For reference, the table has ~150k rows(ids) and 555 variables -- 38 * 14 time periods + some others that aren't related to time. It's about 450mb and I'm working with 8GB RAM and 8GB virtual memory, but both your `merge.stack` and `reshape` exceed my memory capacity. – user3747260 Jul 12 '14 at 07:48
  • @user3747260, Just curious: if you load the data as a "data.table", what size does `tables()` report? – A5C1D2H2I1M1N2O1R2T1 Jul 12 '14 at 11:32