I am currently working on longitudinal data and trying to reshape the data from the wide format to the long. The naming pattern of the time-varying variables is r*variable (for example, height data collected in wave 1 is r1height). The identifiers are hhid (household id) and pn (person id). The data itself is unbalanced. Some variables are observed from first wave to last wave, but others are only observed from the middle of the study (i.e., wave 3 to 5).
I have already reshaped the data using merged.stack from the splitstackshape package (see codes below).
df <- data.frame(hhid = c("10001", "10002", "10003", "10004"),
pn = c("001", "001", "001", "002"),
r1weight = c(56, 76, 87, 64),
r2weight = c(57, 75, 88, 66),
r3weight = c(56, 76, 87, 65),
r4weight = c(78,99,23,32),
r5weight = c(55, 77, 84, 65),
r1height = c(151, 163, 173, 153),
r2height = c(154, 164, NA, 154),
r3height = c(NA, 165, NA, 152),
r4height = c(153, 162, 172, 154),
r5height = c(152,161,171,154),
r3bmi = c(22,23,24,25),
r4bmi = c(23,24,20,19),
r5bmi = c(21,14,22,19))
library(splitstackshape)
# Merge stack (this is what I want)
long1 <- merged.stack(df, id.vars = c("hhid", "pn"),
var.stubs = c("weight", "height", "bmi"),
sep = "var.stubs", atStart = F, keep.all = FALSE)
Now I want to know if I can use the "reshape" function to get the same results. I have tried using reshape method but failed. For example, the reshape function, as shown in the code below, returns bizarre longitudinal data. I thought the "sep" statement should cause the problem, but I don't know how to specify a pattern for my time-varying variables.
# Reshape (Wrong results)
library(reshape)
namelist <- names(df)
namelist <- namelist[namelist %in% c("hhid", "pn") == FALSE]
long2 <- reshape(data=df,
varying = namelist,
sep = "",
direction = "long",
idvar = c("hhid", "pn"))
Could anyone let me know how to address this problem?
Thanks