I am trying to use a function similar to, if not actually, regsubsets in the leaps package in program R when selecting the top Cox Proportional Hazards models for my data. Is this possible? and if so does a function already exist?
Asked
Active
Viewed 3,863 times
5
-
2You should first be seriously questioning whether that is a good idea. – IRTFM Jan 15 '13 at 22:02
-
What model selection procedure do you recommend? Is it not appropriate to see how close the competing models are before I accept a model as the "best"? – marcellt Jan 15 '13 at 22:15
-
You should first be thinking about the underlying science in the domain of investiagation. It is not possible to recommend a "best procedure" unless the goals are clear. – IRTFM Jan 15 '13 at 22:53
-
This is not a stats forum but a programming forum. I am looking for advice on whether or not this function exists, not whether or not I should or can justify using it. – marcellt Jan 15 '13 at 23:07
-
My comments were designed to explain why I suspected the developers did not desire to put such a mechanism in the hands of people who did not know that they should be used with great caution. I suggest you search for discussion over the years in Rhelp from Terry Therneau, Frank Harrell, and Thomas Lumley regarding stepwise methods. Here's one: http://markmail.org/search/?q=list%3Aorg.r-project.r-help+therneau+stepwise#query:list%3Aorg.r-project.r-help%20therneau%20stepwise+page:1+mid:7qaizhv6g3schfif+state:results – IRTFM Jan 15 '13 at 23:20
-
1I read through the link you provided and they all seem very critical of stepwise variable selection. This is the exact reason I am trying to find another variable selection method (all subsets). – marcellt Jan 16 '13 at 00:10
-
1Not as far as I know. You're thinking of something like [glmulti](http://www.jstatsoft.org/v34/i12/paper). There's a good review of subset regression there. If you have a small no. of candidates you could use a loop to generate alternative models and store the results of interest in a data.frame then sort it. As you can see, it's gets quite demanding computationally for even modest numbers of predictors. You might want to look into a way making such a loop run in parallel. An alternative is to rewrite the intensive parts of the regression in a compiled language... – dardisco Feb 02 '13 at 00:35
1 Answers
0
I'm guessing you're already familiar with the following... If you're using AIC as your criterion for 'top model' then this would be a reasonable starting point:
library(survival)
data(colon)
c1 <- coxph(Surv(time=time, event=status) ~
as.factor(extent) + age + sex, data=colon)
step(c1)
Caution with this if you have missing values (NA
). There may of course be a better model which is not found by this method, but with a small number of potential predictors you're unlikely to miss it. Caveats as above (thanks @DWin) about using using numerical methods where an informed opinion may be more reliable.

dardisco
- 5,086
- 2
- 39
- 54