Generate survival curve with both time-dependent covariate and time varying coefficient

Question

I want code to generate survival curves in a setting with both

time dependent covariates and
time varying coefficients.

The goal is to demonstrate how billing method affects life insurance policy lapse. It’s complex in that

a customers billing method (invoice or EFT) changes over time,
the effect of billing method on lapse wears off over time, and
the effect of billing method on lapse depends on other covariates.

After reading the vignette on time dependent covariates, I don’t know how to generate survival curves from a model that has both time-dependent covariates and time-varying coefficients.

library(survival)

Samp <- data.frame(
  id = c(143,151,680,134),
  time = c(17,16,17,18) ,
  censor= rep(1,4) , 
  covariate = seq(5,20,length.out = 4))
# Lookup provides the values of a tdc
Lookup <- data.frame(
  id =c(rep(134,2),680,143,rep(151,3)) ,
  billing.mode = c("INV",rep("EFT",2),rep("INV",2),"EFT","INV") ,
  switch.time = c(0,3,rep(0,3),2,7))

# create the tdc 
Samp.tdc <- tmerge(data1=Samp,data2=Samp,id=id,
                    lapse=event(time,censor))
Samp.tdc <- tmerge(data1=Samp.tdc,data2=Lookup,id=id,
                    billing.mode=tdc(switch.time,billing.mode))
Samp.tdc$inv = as.numeric(Samp.tdc$billing.mode == "INV")

# the call looks something like this
fit <-coxph(Surv(tstart, tstop, lapse) ~ inv + tt(inv) +
  covariate*inv, data = Samp.tdc, 
            tt = function(x, t, ...) x * t)

When I say I want to generate survival curves, I mean predicted survival for a fixed set of times and covariate values. Let's say for the LpsData below.

LpsData <- data.frame(
  tstart = rep(c(0,16,17),times=4),
  tstop = rep(16:18,times=4) ,
  lapse = 0 ,
  covariate = rep(c(10,20),each=3,times=2) ,
  inv = rep(c(0,1),each=6) ,
  curve=rep(c('eft','inv'), each=6)
)

score 1 · Answer 1 · answered Dec 08 '18 at 19:43

This is a relatively complex problem and I personally find the capacities of the survival package limited in that regard. For example you have to pre-specify the functional form of the time-variation. An alternative is to use Piece-wise exponential Additive Models (PAMMs), which can be estimated via mgcv::gam and thus inherits all of their flexibility (+ penalized estimation of non-linear effects, including time-varying effects).

In general you have to decide what type of model you want to fit. Let z be your time-dependent covariate. Than potential models could be

linear covariate effect, linearly time-varying, i.e. the model specified in your code (mgcv formula: + z * t +)
non-linear covariate effect, linearly time-varying (formula: + s(z, by = t) +)
linear covariate effect, non-linearly time-varying (formula: + s(t, by = z) +)
non-linear, non-linearly time-varying( formula: + te(t, z) +)

Below is an example using the pbc data from the survival package that is also featured in the survival vignette on time-dependent covariates (see also https://adibender.github.io/pammtools/articles/tdcovar.html for comparison with PAMMs):

library(survival)
library(ggplot2)
theme_set(theme_bw())
library(pammtools)
library(mgcv)

Data transformation

First I transform the data to Piece-wise exponential Data (PED) format:

pbc <- pbc %>% filter(id <= 312) %>%
  select(id:sex, bili, protime) %>%
  mutate(status = 1L * (status == 2))

## Transform to piece-wise exponential data (PED) format
pbc_ped <- as_ped(
  data = list(pbc, pbcseq),
  formula = Surv(time, status)~. | concurrent(bili, protime, tz_var = "day"),
  id = "id") %>% ungroup()

pbc_ped <- pbc_ped %>%
  mutate(
    log_bili = log(bili),
    log_protime = log(protime))

Fit Piece-wise exponential additive model (PAM)

Here I fit a model with 2 time-dependent covariates with linear covariate effects, non-linearly time-varying (although estimates are almost linear due to penalization)

pbc_pam <- gam(ped_status ~ s(tend, k = 10) + s(tend, by = log_bili) +
  s(tend, by = log_protime),
  data = pbc_ped, family = poisson(), offset = offset)

Prediction of the survival for fixed covariates

For prediction I

create a new data set at all unique observed time-points (all unspecified covariates will be set to mean/modus values)
add a time-dependent value of log_bili at each time-point
add survival probability predictions + CI using add_surv_prob

ndf <- make_newdata(pbc_ped, tend = unique(tend)) %>%
  mutate(log_bili = runif(n(), min(log_bili), max(log_bili))) %>%
  add_surv_prob(pbc_pam)

Plot predicted survival probabilities

ggplot(ndf, aes(x = tend, y = surv_prob)) +
  geom_surv() +
  geom_ribbon(aes(ymin = surv_lower, ymax = surv_upper), alpha = 0.3) +
  ylim(c(0, 1))

```

^{Created on 2018-12-08 by the reprex package (v0.2.1)}

Generate survival curve with both time-dependent covariate and time varying coefficient

1 Answers1

Data transformation

Fit Piece-wise exponential additive model (PAM)

Prediction of the survival for fixed covariates

Plot predicted survival probabilities