I have a long-formatted set of longitudinal data with two variables that change over time (cognitive_score
and motor_score
), subject id (labeled subjid
) and age of each subject in days at the moment of measurement(labeled agedays
). Measurements were taken twice.
I want to transform it to wide-formatted longitudinal dataset.
The problem is that agedays
measurements are unique for each subject, and the only way to see which measurement entry was the first, and which was the second, is to check where the agedays is higher (agedays
higher than in the other entry means second measurement, lower agedays
means first measurement).
We thus have this dataset:
subjid agedays cognitive_score motor_score
<int> <int> <dbl> <dbl>
1 4900001 457 0.338 0.176
2 4900001 1035 0.191 0.216
3 4900002 639 0.25 0.176
4 4900002 1248 0.176 0.353
5 4900003 335 0.103 0.196
6 4900003 913 0.176 0.196
And what I tried was using reshape:
reshape(dataset_col, direction = "wide", idvar = "subjid", timevar = "agedays", v.names = c("cognitive_score", "motor_score"))
Where dataset_col
is the name of the dataset.
What it does, however, is adding these two columns:
The numbers in the name of the columns seem to be the values of agedays
variable.
Any advice on how I can do this?