How can I interpret the P-value of nonlinear term under rms: anova?

Question

Here I demonstrated a survival model with rcs term. I was wondering whether the anova()under rms package is the way to test the linearity association? And How can I interpret the P-value of the Nonlinear term (see 0.094 here), does that support adding a rcs() term in the cox model?

library(rms)
data(pbc)
d <- pbc
rm(pbc, pbcseq)
d$status <- ifelse(d$status != 0, 1, 0)
dd = datadist(d)
options(datadist='dd')

# rcs model
m2 <- cph(Surv(time, status) ~  rcs(albumin, 4), data=d)
anova(m2)

Wald Statistics      Response: Surv(time, status) 
Factor     Chi-Square d.f. P     
albumin    82.80      3    <.0001
Nonlinear   4.73      2    0.094 
TOTAL      82.80      3    <.0001

IRTFM · Answer 1 · 2017-12-29T22:55:03.673

The proper way to test is with model comparison of the log-likelihood (aka deviance) across two models: a full and reduced:

m2 <- cph(Surv(time, status) ~  rcs(albumin, 4), data=d)
anova(m2)
m <- cph(Surv(time, status) ~  albumin, data=d)

p.val <- 1- pchisq( (m2$loglik[2]- m$loglik[2]), 2 )

You can see the difference in the inference using the less accurate Wald statistic (which in your case was not significant anyway since the p-value was > 0.05) versus this more accurate method in the example that Harrell used in his ?cph help page. Using his example:

> anova(f)
                Wald Statistics          Response: S 

 Factor     Chi-Square d.f. P     
 age        57.75      3    <.0001
  Nonlinear  8.17      2    0.0168
 sex        18.75      1    <.0001
 TOTAL      75.63      4    <.0001

You would incorrectly conclude that the nonlinear term was "significant" at conventional 0.05 level. This despite the fact that code creating the model was constructed as entirely linear in age (on the log-hazard scale):

h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))

Create a reduced mode and compare:

f0 <- cph(S ~ age + sex, x=TRUE, y=TRUE)
anova(f0)
#-------------
                 Wald Statistics          Response: S 

 Factor     Chi-Square d.f. P     
 age        56.64      1    <.0001
 sex        16.26      1    1e-04 

 TOTAL      75.85      2    <.0001

The difference in deviance is not significant with 2 degrees of freedom difference:

1-pchisq((f$loglik[2]- f0$loglik[2]),2)
[1] 0.1243212

I don't know why Harrell leaves this example in, because I've taken his RMS course and know that he endorses the cross-model comparison of deviance as the more accurate approach.

How can I interpret the P-value of nonlinear term under rms: anova?

1 Answers1

Linked