Time-varying covariates: formatting for one categorical variable with 3 levels

Question

(This is an edited version of a previously closed question)

I have a data.frame (condensed to testdata) of several demographic variables in addition to one variable with three levels for three possible comorbidities.

dput(testdata)
structure(list(age = c(31L, 48L, 19L, 23L, 24L, 24L, 40L, 22L, 
25L, 20L, 39L, 26L, 28L, 27L, 25L), gender = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("F", 
"M"), class = "factor"), race = structure(c(NA, NA, 1L, NA, 1L, 
NA, NA, NA, 2L, 1L, NA, 3L, NA, 1L, NA), .Label = c("C", "M", 
"N", "R", "Z"), class = "factor"), Time1 = c(NA, NA, NA, 319, 
NA, 133, NA, 121, NA, NA, 30, NA, NA, NA, NA), Time2 = c(NA, 
109, NA, NA, NA, NA, NA, NA, NA, NA, NA, 108, 52, NA, NA), Time3 = c(NA, 
73, NA, NA, 4, NA, NA, 121, NA, NA, 2, NA, NA, NA, NA), OutcomeTime = c(4380, 
199, 4380, 4380, 4380, 4380, 4380, 4380, 4380, 4380, 196, 4380, 
4380, 4380, 4380), CoMo1 = c(NA, NA, NA, 1, NA, 1, NA, 1, NA, 
NA, 1, NA, NA, NA, NA), CoMo2 = c(NA, 2, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 2, 2, NA, NA), CoMo3 = c(NA, 3, NA, NA, 3, NA, 
NA, 3, NA, NA, 3, NA, NA, NA, NA), Outcome = c(0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0), ID = 1:15), class = "data.frame", row.names = 
c("1", 
"4", "6", "7", "8", "11", "14", "18", "19", "26", "27", "28", 
"30", "31", "38"))

Time1, Time2, Time3 are the times at which the ID had the comorbidity, CoMo1, CoMo2, CoMo3 respectively. The Outcome variable is whether death occurred or not. The OutcomeTime is when death occurred or if they patient was followed to the end of the study with no death.

In trying to set this up using tmerge(), I've had some difficulty with the commands (first time doing this!):

newdata<-tmerge(data1=testdata[,c(1:3,12)], data2=testdata, id=ID, 
tstop=OutcomeTime, tstart=0)
newdata<-tmerge(newdata, testdata, id=ID, Comorbid=event(OutcomeTime))
newdata<-tmerge(newdata, testdata, id=ID, Comorbid=event(Time1))
newdata<-tmerge(newdata, testdata, id=ID, Comorbid=event(Time2))
newdata<-tmerge(newdata, testdata, id=ID, Comorbid=event(Time3))
newdata<-tmerge(newdata,newdata, ID, enum=cumevent(tstart))

I've tried several renditions with the above and I'm not getting quite what I want:

1) I would like a column specifying which comorbidity is occurring in the interval

2) I would like a column for the Outcome variable. As it is currently, it looks like maybe for IDs that have comorbities, it's adding another time interval for the Outcome but it's not being recorded anywhere. It really should be it's own variable.

3) Can someone explain the difference between when to use event, cumevent, tdc, and cumtdc?

Thank you!

I cannot figure out what sort of connection to measurement of reality corresponds to the data. For instances when I tabulate the Time columns to see how many non-missing values are in various rows I see: `table(rowSums(!is.na(dat[, grepl("Time", names(dat))])) )` Result: Counts: 0 1 2 3 4 5 6 Number of rows with counts: 21 14 6 7 5 1 1. So 21 rows have no time value, 14 rows have a single time value and 2 rows have 5 or 6 values. How is this a survival experiment? — IRTFM, Apr 22 '20 at 18:00
I appear to have failed in my request for clarification. Voting to close. — IRTFM, Apr 28 '20 at 15:21
@IRTFM I've updated the question and data set. So sorry for the confusion! — Mircea_cel_Batran, May 06 '20 at 15:16
You still seem to be assuming we will know whether a "comorbidity" is permanent or transient. Once you have any one of the comorbidities, will it be assumed present until OutcomeTime, so if CoMo3 happens before CoMo2 that they are both assumed to be present until the event time? It also appears that a "2" in the CoMo2 column only means "present at Time2", so it's more like TRUE rather than a value of 2 of anything. — IRTFM, May 06 '20 at 19:50
@IRTFM that's a great question regarding the permanent or transient state. As it was explained to me, you have it at the time and that's it, hence having one variable with 3 factor levels, rather than 3 variables that are binary. The values of 1/2/3 in CoMo# columns are because I thought `tmerge()` would be able to use that for identification of the comorbidity. I was wrong, as I tried it as simple binary for each and it produced the same output. So it is indeed the presence of that comorbidity at that specific time. — Mircea_cel_Batran, May 06 '20 at 20:27
If that's the case then we would be going for a situation where the time of the CoMo_x would be the basis for another line with a start and end time. You need one line of data for each interval. There would be a single CoMo-variable with 4 levels and have the base level as "none". The task as I understand it would be to "unpack" the single lines into multiple lines where the number of new lines would be number of CoMo entries in the original data +1 (at least for all the CoMo-times that were > 0). If any of the CoMo times were 0 then there would only be the number of — IRTFM, May 07 '20 at 18:12
@IRTFM How can I do that? I thought that's what the tmerge() function would do, or do I need to reformat testdata as long first, with a column for CoMo and another for Time? Thank you for your help! — Mircea_cel_Batran, May 12 '20 at 12:47

Time-varying covariates: formatting for one categorical variable with 3 levels

0 Answers0