Can I perform all subsets variable selection for a Cox Proportional Hazards Model in R?

Question

I am trying to use a function similar to, if not actually, regsubsets in the leaps package in program R when selecting the top Cox Proportional Hazards models for my data. Is this possible? and if so does a function already exist?

You should first be seriously questioning whether that is a good idea. — IRTFM, Jan 15 '13 at 22:02
What model selection procedure do you recommend? Is it not appropriate to see how close the competing models are before I accept a model as the "best"? — marcellt, Jan 15 '13 at 22:15
You should first be thinking about the underlying science in the domain of investiagation. It is not possible to recommend a "best procedure" unless the goals are clear. — IRTFM, Jan 15 '13 at 22:53
This is not a stats forum but a programming forum. I am looking for advice on whether or not this function exists, not whether or not I should or can justify using it. — marcellt, Jan 15 '13 at 23:07
My comments were designed to explain why I suspected the developers did not desire to put such a mechanism in the hands of people who did not know that they should be used with great caution. I suggest you search for discussion over the years in Rhelp from Terry Therneau, Frank Harrell, and Thomas Lumley regarding stepwise methods. Here's one: http://markmail.org/search/?q=list%3Aorg.r-project.r-help+therneau+stepwise#query:list%3Aorg.r-project.r-help%20therneau%20stepwise+page:1+mid:7qaizhv6g3schfif+state:results — IRTFM, Jan 15 '13 at 23:20
I read through the link you provided and they all seem very critical of stepwise variable selection. This is the exact reason I am trying to find another variable selection method (all subsets). — marcellt, Jan 16 '13 at 00:10
Not as far as I know. You're thinking of something like [glmulti](http://www.jstatsoft.org/v34/i12/paper). There's a good review of subset regression there. If you have a small no. of candidates you could use a loop to generate alternative models and store the results of interest in a data.frame then sort it. As you can see, it's gets quite demanding computationally for even modest numbers of predictors. You might want to look into a way making such a loop run in parallel. An alternative is to rewrite the intensive parts of the regression in a compiled language... — dardisco, Feb 02 '13 at 00:35

dardisco · Answer 1 · 2013-11-09T05:29:31.133

I'm guessing you're already familiar with the following... If you're using AIC as your criterion for 'top model' then this would be a reasonable starting point:

library(survival)
data(colon)
c1 <- coxph(Surv(time=time, event=status) ~
     as.factor(extent) + age + sex, data=colon)
step(c1)

Caution with this if you have missing values (NA). There may of course be a better model which is not found by this method, but with a small number of potential predictors you're unlikely to miss it. Caveats as above (thanks @DWin) about using using numerical methods where an informed opinion may be more reliable.

Can I perform all subsets variable selection for a Cox Proportional Hazards Model in R?

1 Answers1