I have a panel data set with a large number of time periods and units, and also a large number of variables which I observe for each period and unit.
Since I want to apply a univariate time series operation for each unit and variable, I have to bring the panel data to wide format (using data.table::dcast) so that each column shows now a variable for a given unit across time.
After applying my time series observation I want to go back to the "long" panel format (using data.table::melt), however, here, I lose information on the unit names and variables. Since the data.table is quite large, I am afraid of mixing up data here which is why I would like to find a melt operation that preserves the variables and value names.
Consider the following example panel data set:
require(data.table)
dates <- seq(from = as.Date("2007-02-01"), to = as.Date("2012-01-01"), by = "month")
id <- paste0(c("A", "B", "C"), 1:10)
DT <- data.table(
time = rep(dates, 10),
idx = rep(id, each = 60),
String1 = runif(600),
String2 = runif(600),
String3 = runif(600)
)
time idx String1 String2 String3
1: 2007-02-01 A1 0.5412122 0.23502234 0.3858354
2: 2007-03-01 A1 0.3248168 0.32884580 0.7183147
3: 2007-04-01 A1 0.4183034 0.40781723 0.7438458
4: 2007-05-01 A1 0.3597997 0.51745402 0.1660566
5: 2007-06-01 A1 0.6405351 0.96121729 0.7786483
---
596: 2011-09-01 A10 0.7896711 0.64740298 0.8285408
597: 2011-10-01 A10 0.6582652 0.83986453 0.1292342
598: 2011-11-01 A10 0.1110465 0.41741672 0.7076345
599: 2011-12-01 A10 0.5108850 0.02940229 0.9038370
600: 2012-01-01 A10 0.2605052 0.10136480 0.3881788
I am bringing this panel data set to wide format. After applying some time series operations with it (not shown here), I will have to delete some columns if there's not enough data for them. Then, I bring the data back to long format
variable_names <- names(DT[,-c("time", "idx")])
DT_long <- dcast(DT, time ~ idx, value.var = variable_names)
DT_long <- DT_long[,-(5:10)]
DT_wide <- melt(DT_long, measure = patterns("^String1", "^String2", "^String3"), value.name = variable_names, variable.name = "idx)
time idx String1 String2 String3
1: 2007-02-01 1 0.9794707 0.5290352 0.68009050
2: 2007-03-01 1 0.4016173 0.9229200 0.38652407
3: 2007-04-01 1 0.9475505 0.5956701 0.24686007
4: 2007-05-01 1 0.6465847 0.8233340 0.08008369
5: 2007-06-01 1 0.5704834 0.8232598 0.85790038
---
596: 2011-09-01 10 NA 0.5525413 0.79994190
597: 2011-10-01 10 NA 0.3895864 0.41347910
598: 2011-11-01 10 NA 0.3123646 0.44461146
599: 2011-12-01 10 NA 0.2148686 0.37609448
600: 2012-01-01 10 NA 0.7314114 0.47138012
DT_wide now looks like this which means I have lost the information on variable (here: idx) names. One fix I imagine is numbering all idx through with numbers and then doing this operation. However, if possible I would like to preserve string names since they help me a lot to distinguish and understand the values. Can someone help me how I can rewrite melt to include this information?