I have a sample longitudinal dataset with the columns: PATIENTID (IDs of patients), VISITNUMBER (their number of visits to the hospital), TIME (time in years since first visit), AGE (their age at each visit), SEX (0 = male, 1 = female), HEALTH (their health status) at each visit.
This is my sample dataset in R:
#data structure
PATIENTID <- c(126, 126, 126, 255, 255, 389, 389, 389, 389, 389, 470, 470, 470)
VISITNUMBER <- c(1, 2, 3, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3)
TIME<- c(0, 4, 6, 0, 3, 0, 1, 2, 3, 4, 0, 1, 2)
AGE<- c(18, 22, 24, 20, 23, 30, 31, 32, 33, 34, 40, 41, 42)
SEX<- c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0)
HEALTH <- c(0.333, 0.452, 0.468, 0.571, 0.522, 0.444, 0.452, 0.431, 0.510, 0.532, 0.214, 0.333, 0.400)
mydata <- data.frame(PATIENTID, VISITNUMBER, TIME, AGE, SEX, HEALTH)
#converting PATIENTID and VISITNUMBER to factor
mydata$PATIENTID <- factor(mydata$PATIENTID)
mydata$VISITNUMBER <- factor(mydata$VISITNUMBER)
Essentially, I am trying to predict HEALTH in a regression model (HEALTH ~ AGE + SEX) while adjusting for baseline HEALTH (HEALTH at visit 1). I have two options:
- Either creating a separate variable, called HEALTH1, which is basically HEALTH at visit 1 so my dataset looks like the following. How do I code for this? I just did it manually in this example but my dataset is much larger.
Health at visit 1 variable added
- Without actually creating a separate variable, during the regression coding process, I somehow put HEALTH filtered at visit 1 as a separate explanatory variable - like HEALTH ~ AGE + SEX + HEALTHif(visit1). If this is possible, how do I code for this separate explanatory variable in the regression coding?
Any alternative suggestions are welcome. Thank you!