1

I'm running a cox proportional hazard model on a heuristic dataset of operational delays given two covariates: cost and demand:

> dat.op
   delay   censor cost demand
1 2.875000      1 3.10   0.1
2 1.569444      1 0.68   0.1
3 2.000000      1 6.05   0.2
4 1.750000      1 5.22   0.1
5 2.000000      1 4.67   0.3
6 3.000000      1 9.30   1.4

Under coxph, both covariates, as expected, have a negative effect on the hazard rate, with coefficients -0.0813 and -2.5490. In other words (ignoring the high p values for now), both cost and demand contribute to increased operational delay:

> coxph(Surv(delay, censor) ~ cost + demand, data=dat.op)
Call:
coxph(formula = Surv(delay, censor) ~ cost + demand, data = dat.op)

          coef exp(coef) se(coef)     z    p
cost   -0.0813    0.9219   0.3909 -0.21 0.84
demand -2.5490    0.0782   3.6635 -0.70 0.49

Likelihood ratio test=3.35  on 2 df, p=0.187
n= 6, number of events= 6 

However, when I run the data through flexsurvreg so to also obtain parametric estimates of the underlying hazard distribution (assuming the commonly used Weibull), I observe different effects:

> flexsurvreg(Surv(delay, censor) ~ cost + demand, data=dat.op, dist="weibull")
Call:
flexsurvreg(formula = Surv(delay, censor) ~ cost + demand, data = dat.op, 
    dist = "weibull")

Estimates: 
        data mean  est      L95%     U95%     se       exp(est)  L95%     U95%   
shape        NA     5.2163   2.7040  10.0627   1.7487       NA        NA       NA
scale        NA     2.5223   1.3762   4.6226   0.7796       NA        NA       NA
cost     4.8367    -0.0427  -0.2119   0.1264   0.0863   0.9582    0.8091   1.1348
demand   0.3667     0.3943  -0.4005   1.1891   0.4055   1.4834    0.6700   3.2842

N = 6,  Events: 6,  Censored: 0
Total time at risk: 13.19444
Log-likelihood = -3.914008, df = 4
AIC = 15.82802

Here, demand has a coefficient of 0.3943, suggesting that it reduces operational delay, which is nonsensical.

Switching over to the Gompertz distribution, I now see that cost reduces operational delay:

> flexsurvreg(Surv(delay, censor) ~ cost + demand, data=dat.op, dist="gompertz")
Call:
flexsurvreg(formula = Surv(delay, censor) ~ cost + demand, data = dat.op, 
    dist = "gompertz")

Estimates: 
        data mean  est        L95%       U95%       se         exp(est)   L95%       U95%     
shape          NA   2.27e+00   7.05e-01   3.83e+00   7.97e-01         NA         NA         NA
rate           NA   6.10e-03   2.36e-05   1.58e+00   1.73e-02         NA         NA         NA
cost     4.84e+00   2.56e-01  -6.92e-01   1.21e+00   4.84e-01   1.29e+00   5.00e-01   3.34e+00
demand   3.67e-01  -2.26e+00  -6.98e+00   2.46e+00   2.41e+00   1.04e-01   9.27e-04   1.17e+01

N = 6,  Events: 6,  Censored: 0
Total time at risk: 13.19444
Log-likelihood = -4.208994, df = 4
AIC = 16.41799

Am I mis-interpreting these flexsurvreg results? Is there a way to obtain Weibull parameter estimates from a set of output more consistent with the estimates from coxph?

neither-nor
  • 1,245
  • 2
  • 17
  • 30
  • 1
    Might be informative to see what these model give as predictions while holding one variable constant. Interpreting raw regression coefficients can be misleading. (For instance, the survival package uses a different parametrization for the Weibull distribution than does the stats package.) – IRTFM Jun 05 '19 at 15:24
  • 1
    In addition to comments above, I wouldn't put too much faith in the interpretation of the sign for sample size of 6. For both distributions the confidence intervals cover 0 for both coefficients. Different signs and high uncertainty is what you would expect for such small sample size and given the different assumptions about the baseline distribution – adibender Jun 06 '19 at 14:20
  • @adibender: Excellent points. Many people think that the "sample size" of a population is the number of cases at risk and fail to understand that almost all of the statistical power for a survival analysis comes from the number of events. – IRTFM Jun 06 '19 at 16:12
  • @42 Great advice. But can you expand on what you mean by "the survival package uses a different parametrization for the Weibull than does the stats package"? – neither-nor Jun 07 '19 at 01:45

0 Answers0