0

Surprised I couldn't find any answer to this, or someone doing something similar.

Question: I am running a bootstraps on a GLM that takes a long time. However, I don't care about any output/SE's/tests or matrices that glm gives me, just the MLE for the coefficients (because I am using the bootstrap for uncertainty). It seems it would be way faster/memory efficient to just run the ML optimizer store MLE for each coefficient, rinse/repeat. Any way to do this besides writing an optim() script from scratch?

P.S. Q: Any way to tell glm() to use something besides IRWLS for algorithm, like N-R etc.

The X matrix is about 100mil observations, 10-15 parameters. I haven't had any serious issues with convergence, just a big matrix to deal with. I have been using fastglm(, method=3) which I found the fastest and to give answer comparable to glm().

It takes about 15-20 minutes to run the regression, so to get 200 or so bootstraps takes a few days. I would like to run in parallel but lack the memory for it.

KevinM
  • 49
  • 5

1 Answers1

1

You should benchmark glm.fit(), which takes a model matrix X and a response y instead of doing all of the formula stuff. Also the fastglm package.

You might be able to speed glm.fit up a little bit by giving it the predicted values from the initial fit as mustart (this should cut the number of iterations required to converge).

glm() takes a method= argument; the default is glm.fit (IRLS), but you could swap in something else if you like. It's hard to know what you would want to substitute, though: at least for canonical link functions, IRLS is equivalent to N-R. Do you have a better/faster fitting algorithm in mind? "just run[ning] the ML optimize[r]": it will be pretty hard to improve on IRLS, I think.


Since you're already using fastglm, there's less scope for improvement. If one bootstrap replicate takes 15 minutes, then removing overhead (e.g. by using fastglmPure instead of fastglm, analogous to the difference between glm.fit and glm) will hardly do anything. However, trying out different values of method (for faster but slightly less numerically stable algorithms) is probably worth a try ... (given that you're already using method=3 I'm guessing you've tried that).

The only other performance tricks I can think of, in the absence of sufficient RAM/computing resources to run the bootstrap replicates in parallel, is to experiment with optimized/parallelized BLAS libraries. These should not in general require much additional memory, and might speed up your linear algebra considerably.


A little bit of benchmarking:

library(fastglm)
library(microbenchmark)
library(ggplot2)

set.seed(101)
n <- 1e6
dd <- data.frame(x = rnorm(n))
dd$y <- rbinom(n, size = 1, prob = plogis(2-3*dd$x))
X <- model.matrix(~x, data = dd)


b <- binomial()
m1 <- microbenchmark(
    glm = glm(y~x, dd, family = b),
    glm.fit = glm.fit(X, dd$y, family = b),
    fastglm = fastglm(X, dd$y, family = b),
    fpure = fastglmPure(X, dd$y, family = b),
    fpure3 = fastglmPure(X, dd$y, family = b, method = 3),
    times  = 20)

autoplot(m1)

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks for suggestions, I actually am already using fastglm with initial values using a prior regression on the full sample. The procedure takes about an 15-20 mins as is. So to get enough bootstraps takes a couple days. I am not running in parallel because not enough memory... – KevinM Feb 04 '23 at 17:01
  • Can you add these details to your question and give more other details of the computations? (What are the dimensions of your `X` [number of observations and predictors]? How many bootstrap replicates are you running, how long does each take?) Can you get access to cloud computing resources? – Ben Bolker Feb 04 '23 at 17:07
  • oh wait: 15-20 minutes **per replicate** ?? – Ben Bolker Feb 04 '23 at 17:36
  • Added some details to comment, I can't use cloud computing, but am looking into getting more RAM allocated for my server instance. – KevinM Feb 04 '23 at 21:47
  • I would be interested to hear back whether using `fastpureGlm` with an alternative method buys you any significant speed. – Ben Bolker Feb 04 '23 at 21:48
  • As in your example, it is marginally faster but no major difference. One additional trick suggested to me was to stratify the data by subgroups (so they are smaller and can fit into RAM in parallel), this changes the estimator but not necessarily in a bad way. – KevinM Feb 09 '23 at 16:34
  • 1
    Ben, I thought you might be interested. I tested out the parglm package, which breaks the QR decomposition into chunks and uses parallelized operations and it was much faster than the other options, probably because I have 32 cores to deploy. The other thing I haven't tried yet but thinking about is the "little bag of bootstraps" method, which may help speed up bootstrap procedure immensely. I am still looking into how it performs for longitudinal data / non i.i.d data. – KevinM Mar 03 '23 at 17:46