Simulate competing risk data

Question

My goal is to simulate a data set that can be used to test a competing risk model. I am just trying a simple example with the survsim::crisk.sim function but it does not lead to the results I expect.

 require(survival)
 simulated_data <- survsim::crisk.sim(n = 100,
                                      foltime = 200,
                                      dist.ev = rep("weibull", 2),
                                      anc.ev = c(0.8, 0.9),
                                      beta0.ev = c(2, 4),
                                      anc.cens = 1,
                                      beta0.cens = 5,
                                      nsit = 2)

 model <- survreg(Surv(time, status) ~ 1 + strata(cause), data = simulated_data)

 exp(model$scale)

 ## cause=1  cause=2 
 ## 4.407839 2.576357

I would expect these numbers to be the same as beta0.ev. Any pointers to what I might do wrong or other suggestions how to simulate competing risk data.

For completion: I would like the events in the simulated data to occur following a Weibull distribution that is different for each risk. I would like to be able to specify a strata and cluster in the data. The censoring can follow a Weibull or Bernouli distribution.

Check the examples for survreg and make sure that the model survreg is fitting is parameterized in the same was as crisk.sim. — atiretoo, Sep 30 '15 at 02:19
The help page for survreg specifically warns us about the parametrization for Weibull parameters. `# survreg's scale = 1/(rweibull shape) # survreg's intercept = log(rweibull scale)` — IRTFM, Sep 30 '15 at 04:54

Jon · Accepted Answer · 2020-12-11T13:37:29.423

To recover your specified estimates, you can use survreg with cause-specific notation.

This example uses your parameters, but with more patients for more precise estimates:

set.seed(101)
stack_data <- survsim::crisk.sim(n = 2000,
                                     foltime = 200,
                                     dist.ev = rep("weibull", 2),
                                     anc.ev = c(0.8, 0.9),
                                     beta0.ev = c(2, 4),
                                     anc.cens = 1,
                                     beta0.cens = 5,
                                     nsit = 2)

m1 <- survreg(Surv(time, cause==1) ~ 1, data =stack_data, dist = "weibull")
m2 <- survreg(Surv(time, cause==2) ~ 1, data = stack_data, dist = "weibull")

m1$coefficients This will approach beta0.ev for cause 1

m2$coefficients This will approach beta0.ev for cause 2

> m1$coefficients
(Intercept) 
   1.976449 
> m2$coefficients
(Intercept) 
   3.995716

m1$scale This will approach anc.ev for cause 1

m2$scale This will approach anc.ev for cause 2

> m1$scale
[1] 0.8088574
> m2$scale
[1] 0.8923334

Unfortunately this only holds with uniform censoring, or low non-uniform censoring (such as in your example)

If we increase the hazard of censoring then the intercepts do not represent the beta0.ev parameters

set.seed(101)
stack_data <- survsim::crisk.sim(n = 2000,
                                     foltime = 200,
                                     dist.ev = rep("weibull", 2),
                                     anc.ev = c(0.8, 0.9),
                                     beta0.ev = c(2, 4),
                                     anc.cens = 1,
                                     beta0.cens = 2, #reduced from 5, increasing the hazard function for censoring rate
                                     nsit = 2)

m1 <- survreg(Surv(time, cause==1) ~ 1, data =stack_data, dist = "weibull")
m2 <- survreg(Surv(time, cause==2) ~ 1, data = stack_data, dist = "weibull")

> m1$coefficients
(Intercept) 
   1.531818 
> m2$coefficients
(Intercept) 
   3.553687 
> 
> m1$scale
[1] 0.8139497
> m2$scale
[1] 0.8910465

Simulate competing risk data

1 Answers1