GLMs in R - large data sets/complex model formulae

Question

I use R to fit a lot of GLMs on medium-large data sets. Typically 500k-1M rows and up to 50 factors in my models (prior to simplification - banding or dropping factors that aren't predictive, etc.

Base R's glm() doesn't seem to cope well with this size of problem. I can and do use revoScaleR::rxGlm() instead, which is much better in this respect, but this has its own problems (patchy documentation, unable to use other R functions designed to work with glm objects, etc.).

Are there any alternatives that I'm not aware of? What's currently the preferred glm package for this sort of thing?

(I do need to stick to the GLM framework for the moment - I may at some point branch out into other modelling techniques - of which there are plenty of course - but that's one for later on...)

Thanks.

you can limit number of factors in your dataset. Usually when there are many factors, `glm` will have some issues. You can also use label encoder instead of factor. — Reza, Aug 27 '20 at 23:41
make sure your model matrices are sparse (use `Matrix::sparse.model.matrix` and pass the results to `glm.fit` ...); `speedglm`, `fastglm`, `biglm` package (latter only if you need out-of-memory computations) — Ben Bolker, Aug 28 '20 at 00:26

GLMs in R - large data sets/complex model formulae

0 Answers0