I need for a simulation to estimate more than a billion different logistic regressions (logit link) of which I only need to keep track of coefficients estimations. Nothing else.
I'm looking for something that can allow me to obtain this faster than glm(...,family=binomial)
.
I tried to use the optimize function on the likelihood, but even starting the algorithm from the real coefficients values, it takes almost the same of the "glm" command.
The model includes 1-5 covariates, n between 100 and 1000. Furthermore: i'm running the main loop in parallel, and it works (I reduced the previous time of around 70%), but it's still a long run if I need to simulate n=1000.
Asked
Active
Viewed 137 times
1

Giovanni Romeo
- 11
- 3
-
1Is this a duplicate? http://stackoverflow.com/questions/16284766/how-to-speed-up-glm-estimation-in-r (I'm not 100% sure, as that one is about doing a single relatively large fit). optimizing the likelihood directly should be much *slower* than `glm()`. (Billions? Wow.) PS - it would be helpful to know the number of observations/parameters involved (on average) in individual regressions. – Ben Bolker Jan 26 '16 at 13:00
-
I was going to give the same link as Ben (in particular to his comment there). I believe an optimized BLAS and paralellization (e.g., renting some CPUs from Amazon EC2 if you don't have a cluster at your institution) seem to be an obvious approach. – Roland Jan 26 '16 at 13:04
-
Nope it's not a duplicate, this are around billion realizations of similar situations: 1-5 covariates, n between 100 and 1000. Furthermore: i'm running the main loop in parallel, and it works (I reduced the previous time of around 70%), but it's still a long run if I need to simulate n=1000. – Giovanni Romeo Jan 26 '16 at 13:06
-
well, how far have you gotten by implementing `speedglm()` and/or `glm.fit`? Can you tell us how much the current set of suggestions helps? You should edit your question to include the information about covariates and n (comments are ephemeral) ... – Ben Bolker Jan 26 '16 at 13:23
-
I will try with this commands you suggested me, and I found the MRO (Microsoft R Open) version of R – Giovanni Romeo Jan 26 '16 at 15:17
-
Some update: I've tried some optimized BLAS like MRO, and packages like foreach (for parallelization) and speedglm. Using the BLAS without forcing the loops in parallel improve things only slightly, and forcing (with foreach) parallelization takes more or less the same than vanilla R, further speedglm seems to be incompatible with foreach loops. – Giovanni Romeo Feb 04 '16 at 13:55
-
curious as to what you ended up doing, Giovanni. did you try using glm.fit directly instead of glm? – uller Jul 30 '19 at 18:02
-
Hi Uller, I almost forgot of this post! I ended up not doing anything of what written here. I used parallel computing on a cluster computer. With a 30-hours session, things were done. – Giovanni Romeo Aug 01 '19 at 08:25