5

I am trying to use a function similar to, if not actually, regsubsets in the leaps package in program R when selecting the top Cox Proportional Hazards models for my data. Is this possible? and if so does a function already exist?

Jules
  • 14,200
  • 13
  • 56
  • 101
marcellt
  • 501
  • 5
  • 14
  • 2
    You should first be seriously questioning whether that is a good idea. – IRTFM Jan 15 '13 at 22:02
  • What model selection procedure do you recommend? Is it not appropriate to see how close the competing models are before I accept a model as the "best"? – marcellt Jan 15 '13 at 22:15
  • You should first be thinking about the underlying science in the domain of investiagation. It is not possible to recommend a "best procedure" unless the goals are clear. – IRTFM Jan 15 '13 at 22:53
  • This is not a stats forum but a programming forum. I am looking for advice on whether or not this function exists, not whether or not I should or can justify using it. – marcellt Jan 15 '13 at 23:07
  • My comments were designed to explain why I suspected the developers did not desire to put such a mechanism in the hands of people who did not know that they should be used with great caution. I suggest you search for discussion over the years in Rhelp from Terry Therneau, Frank Harrell, and Thomas Lumley regarding stepwise methods. Here's one: http://markmail.org/search/?q=list%3Aorg.r-project.r-help+therneau+stepwise#query:list%3Aorg.r-project.r-help%20therneau%20stepwise+page:1+mid:7qaizhv6g3schfif+state:results – IRTFM Jan 15 '13 at 23:20
  • 1
    I read through the link you provided and they all seem very critical of stepwise variable selection. This is the exact reason I am trying to find another variable selection method (all subsets). – marcellt Jan 16 '13 at 00:10
  • 1
    Not as far as I know. You're thinking of something like [glmulti](http://www.jstatsoft.org/v34/i12/paper). There's a good review of subset regression there. If you have a small no. of candidates you could use a loop to generate alternative models and store the results of interest in a data.frame then sort it. As you can see, it's gets quite demanding computationally for even modest numbers of predictors. You might want to look into a way making such a loop run in parallel. An alternative is to rewrite the intensive parts of the regression in a compiled language... – dardisco Feb 02 '13 at 00:35

1 Answers1

0

I'm guessing you're already familiar with the following... If you're using AIC as your criterion for 'top model' then this would be a reasonable starting point:

library(survival)
data(colon)
c1 <- coxph(Surv(time=time, event=status) ~
     as.factor(extent) + age + sex, data=colon)
step(c1)

Caution with this if you have missing values (NA). There may of course be a better model which is not found by this method, but with a small number of potential predictors you're unlikely to miss it. Caveats as above (thanks @DWin) about using using numerical methods where an informed opinion may be more reliable.

dardisco
  • 5,086
  • 2
  • 39
  • 54