The typical preparation steps for mstate
involve converting "wide" format data (1x row per 'patient') into "multi-state" format data (multiple rows per 'patient' for each possible transition in the multi-state model).
For example, data in wide format:
library(mstate)
data(ebmt4)
ebmt <- ebmt4
> head(ebmt)
id rec rec.s ae ae.s recae recae.s rel rel.s srv srv.s year agecl proph match
1 1 22 1 995 0 995 0 995 0 995 0 1995-1998 20-40 no no gender mismatch
2 2 29 1 12 1 29 1 422 1 579 1 1995-1998 20-40 no no gender mismatch
3 3 1264 0 27 1 1264 0 1264 0 1264 0 1995-1998 20-40 no no gender mismatch
4 4 50 1 42 1 50 1 84 1 117 1 1995-1998 20-40 no gender mismatch
5 5 22 1 1133 0 1133 0 114 1 1133 0 1995-1998 >40 no gender mismatch
6 6 33 1 27 1 33 1 1427 0 1427 0 1995-1998 20-40 no no gender mismatch
Is converted to multi-state format:
tmat <- transMat(x = list(c(2, 3, 5, 6), c(4, 5, 6), c(4, 5, 6), c(5, 6), c(), c()), names = c("Tx", "Rec", "AE", "Rec+AE", "Rel", "Death"))
msebmt <- msprep(data = ebmt, trans = tmat, time = c(NA, "rec", "ae", "recae", "rel", "srv"), status = c(NA, "rec.s", "ae.s", "recae.s", "rel.s", "srv.s"), keep = c("match", "proph", "year", "agecl"))
> head(msebmt)
An object of class 'msdata'
Data:
id from to trans Tstart Tstop time status match proph year agecl
1 1 1 2 1 0 22 22 1 no gender mismatch no 1995-1998 20-40
2 1 1 3 2 0 22 22 0 no gender mismatch no 1995-1998 20-40
3 1 1 5 3 0 22 22 0 no gender mismatch no 1995-1998 20-40
4 1 1 6 4 0 22 22 0 no gender mismatch no 1995-1998 20-40
5 1 2 4 5 22 995 973 0 no gender mismatch no 1995-1998 20-40
6 1 2 5 6 22 995 973 0 no gender mismatch no 1995-1998 20-40
But what if my original dataset has time-varying covariates (i.e. long format) and I want to format the data into multi-state mode? All of the tutorials I have found online are only for converting initially wide data to multi-state data (not initially long data); for example the mstate package vignette.
So, let's say I have the below data df
, where id is for a 'patient', (start
,stop
] tell us the time periods, state
is the state the patient is in at the end of the time period, and tv.cov
is their time-varying covariate (assumed constant over the time period). Note that only patient id=5
has 3x entries and that person's tv.cov
changes.
id start stop state tv.cov
1 0 1 1 1
2 0 4 1 2
3 0 7 1 1
4 0 10 1 5
5 0 6 1 4
5 6 10 2 10
5 10 15 3 12
Assuming the basic "illness-death" transition model:
tmat <- mstate::trans.illdeath(names = c("healthy", "sick", "death"))
> tmat
to
from healthy sick death
healthy NA 1 2
sick NA NA 3
death NA NA NA
How can I prep df
into multi-state format?
As a hack, should I setup the data in "wide" format, format the data into "multi-state" format using msprep
and then join another frame onto it which contains the time-varying covariates for each patient at each time interval?