0

I have a sample longitudinal dataset with the columns: PATIENTID (IDs of patients), VISITNUMBER (their number of visits to the hospital), TIME (time in years since first visit), AGE (their age at each visit), SEX (0 = male, 1 = female), HEALTH (their health status) at each visit.

This is my sample dataset in R:

#data structure

PATIENTID <- c(126, 126, 126, 255, 255, 389, 389, 389, 389, 389, 470, 470, 470)

VISITNUMBER <- c(1, 2, 3, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3)

TIME<- c(0, 4, 6, 0, 3, 0, 1, 2, 3, 4, 0, 1, 2)

AGE<- c(18, 22, 24, 20, 23, 30, 31, 32, 33, 34, 40, 41, 42)

SEX<- c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0)

HEALTH <- c(0.333, 0.452, 0.468, 0.571, 0.522, 0.444, 0.452, 0.431, 0.510, 0.532, 0.214, 0.333, 0.400)

mydata <- data.frame(PATIENTID, VISITNUMBER, TIME, AGE, SEX, HEALTH)

#converting PATIENTID and VISITNUMBER to factor 

mydata$PATIENTID   <- factor(mydata$PATIENTID)

mydata$VISITNUMBER <- factor(mydata$VISITNUMBER)

Essentially, I am trying to predict HEALTH in a regression model (HEALTH ~ AGE + SEX) while adjusting for baseline HEALTH (HEALTH at visit 1). I have two options:

  1. Either creating a separate variable, called HEALTH1, which is basically HEALTH at visit 1 so my dataset looks like the following. How do I code for this? I just did it manually in this example but my dataset is much larger.

Health at visit 1 variable added

  1. Without actually creating a separate variable, during the regression coding process, I somehow put HEALTH filtered at visit 1 as a separate explanatory variable - like HEALTH ~ AGE + SEX + HEALTHif(visit1). If this is possible, how do I code for this separate explanatory variable in the regression coding?

Any alternative suggestions are welcome. Thank you!

rawr
  • 20,481
  • 4
  • 44
  • 78

0 Answers0