0

I am converting from wide to long format for longitduinal data. This is as much so I understand what is going on "in the background" as well as understand whether it is actually possible.

df1 is a combination of 4 waves of data which I have previously full_joined. I have 4 waves with the identifier pidp and three fixed variables, which I have inserted in the first wave, and then following this there are five time varying variables in wave 1.

In wave 2, wave 3 and wave 4 there is a variables called jwbs1_2 which only occurs at wave 2, wave 3 and wave 4.

So there are five time varying variables in wave 1 but six time varying variables at waves 2, 3 and 4, as is shown at the bottom of the post.

I got the error message

 Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying,  : 
  'varying' arguments must be the same length

I was wondering if it is possible to have different amounts of time varying variables in different waves and use them in long format? Is there a way round it?

I have inserted the variables below for illustrative purposes and the code I was using

$ pidp
$ sex     
$ edtype
$ jbsat_1
$ sclfsato_1
$ jbsat_1
$ sf12mcs_1
$ scghq1_1
$ jbsat_2
$ sclfsato_2
$ jbsat_2
$ sf12mcs_2
$ scghq1_2
$jwbs1_2
df2 <- reshape(
    data = df1,
    varying = 4:length(df1),
    timevar = "wave",
    sep = "_",
    idvar = "pidp",
    direction = "long"
)  
alias.123
  • 7
  • 3

1 Answers1

0

With unbalanced data in wide form, you can either append the missing variable jwbs1_1 to your data frame and try again, or use the pivot_longer function from the tidyr package.

Base R (reshape, after appending the missing variable in wave 1):

df1_bal <- data.frame(append(df1, list(jwbs1_1=NA), after=8))
reshape(df1_bal, ...)

tidyr (pivot_longer)

pivot_longer(df1, cols=-c(pidp, sex, edtype), 
          names_to=c(".value","wave"), names_pattern="(.*)_(\\d)")
Edward
  • 10,360
  • 2
  • 11
  • 26