0

I have a question related to the function tmerge() in the R package survival. Trying to set up a data set with time-dependent covariates, but the value(s) of the initial time period is set to NA (see reprex below).

I have one data frame with baseline variables, time-, and event data, and a second data frame with variables measured 3 months after baseline.

Have used the same approach as in the PBC-data example in the vignette by Terry Therneau and Co. (or tried at least! https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf). On p. 11 it says: "The tdc and cumtdc arguments can have 1, 2 or three arguments. The first is always the time point, the second, if present, is the value to be inserted, and an optional third argument is the initial value. If the tdc call has a single argument the result is always a 0/1 variable, 0 before the time point and 1 after. For the 2 or three argument form, the starting value before the first definition of the new variable (before the first time point) will be the initial value. The default for the initial value is NA, the value of the tdcstart option." Not sure I understand the last bit highlighted in bold.

Do not get the same problem when I replicate the PBC-example. Tried to specify init in the second tmerge call and/or the tdcstart option without any success (both generates an error). There are no missing values in the covariates or the outcome (time, event).

Reaching out here, since I cannot find out what I am doing wrong.

Thanks a lot in advance!

PS. This is my first post, so apologize if I have missed something. Hope it makes sense.

library(tidyverse)
library(survival)

set.seed(123)

# Generate data
df_base <- tibble(
  ID = as.numeric(1:100),
  time = as.integer(runif(100, min = 100, max = 730)),
  status = as.factor(sample(x = c("0", "1"), prob = c(0.7, 0.3), size = 100, replace = T)),
  vas = as.integer(rnorm(n = 100, mean = 53, sd = 10)))

df_fu <- tibble(
  ID = as.numeric(1:100),
  fu_3mo = 91,
  vas = as.integer(rnorm(n = 100, mean = 44, sd = 15)))

# Baseline data
head(df_base)
#> # A tibble: 6 x 4
#>      ID  time status   vas
#>   <dbl> <int> <fct>  <int>
#> 1     1   281 0         45
#> 2     2   596 0         55
#> 3     3   357 0         50
#> 4     4   656 1         49
#> 5     5   692 0         43
#> 6     6   128 1         52

# Follow-up data
head(df_fu)
#> # A tibble: 6 x 3
#>      ID fu_3mo   vas
#>   <dbl>  <dbl> <int>
#> 1     1     91    76
#> 2     2     91    63
#> 3     3     91    40
#> 4     4     91    52
#> 5     5     91    37
#> 6     6     91    36

# Generate time-dependent covariates
df_tdc <- tmerge(df_base, df_base, id = ID, surgery = event(time, status))

head(df_tdc)
#>   ID time status vas tstart tstop surgery
#> 1  1  281      0  45      0   281       0
#> 2  2  596      0  55      0   596       0
#> 3  3  357      0  50      0   357       0
#> 4  4  656      1  49      0   656       1
#> 5  5  692      0  43      0   692       0
#> 6  6  128      1  52      0   128       1

df_tdc <- tmerge(df_tdc, df_fu, id = ID, vas = tdc(fu_3mo, vas))
#> Warning in tmerge(df_tdc, df_fu, id = ID, vas = tdc(fu_3mo, vas)): replacement
#> of variable 'vas'

head(df_tdc)
#>   ID time status vas tstart tstop surgery
#> 1  1  281      0  NA      0    91       0
#> 2  1  281      0  76     91   281       0
#> 3  2  596      0  NA      0    91       0
#> 4  2  596      0  63     91   596       0
#> 5  3  357      0  NA      0    91       0
#> 6  3  357      0  40     91   357       0

Created on 2021-11-26 by the reprex package (v0.3.0)

mejoh
  • 1
  • Have you tried doing this without any of the tidyverse trappings? – IRTFM Nov 26 '21 at 19:04
  • Also ... does the PBC example in pkg:survival have an NA at the first row? – IRTFM Nov 27 '21 at 01:33
  • @IRTFM: Tidyverse or not does not seem to matter. I have been in contact with Terry Therneau. Briefly, I assumed that I could add the baseline value from data set 1 and the follow-up data from data set 2 to generate a long data set. However, tmerge() does not support this but requires that the tdc-data comes from the same data set. That is, a data set that is already in long format. – mejoh Nov 29 '21 at 20:01

0 Answers0