0

I am working on a large dataset that contains missing data, and use the mice package in R for multiple imputation. Having created 10 imputed datasets, I want to do stepwise model selection on each of them (see also https://stats.stackexchange.com/questions/46719/multiple-imputation-and-model-selection). For this I use the with function as described in the link above. However, I forecast this will take a few hours, so I would like to do it using multiple cores. Can this be done?

Community
  • 1
  • 1
Sanderr
  • 109
  • 3
  • Well, you can see your options here: https://cran.r-project.org/web/views/HighPerformanceComputing.html However, if your code takes a few hours to run, just start it running before looking at these tools. I doubt they'll be easy to set up quickly if you're doing anything at all complicated. – Frank Aug 04 '16 at 15:11
  • I use parLapply to parallellize the imputation, would this also work for model selection? – Sanderr Aug 04 '16 at 15:44
  • The `with.mids` function is really just a `for`-loop that applies an expression to each completed data set independently of one another. An alternative/equivalent procedure would be to save the completed data sets in a `list` (using `complete`) and to use `lapply` on the list instead of `with.mids`. This can easily be parallelized using `parLapply` or similar. – SimonG Aug 11 '16 at 14:17

0 Answers0