lme4::glmer vs. Stata's melogit command

Question

Lately I have been trying to fit a lot of random effects models to relatively big datasets. Let’s say about 50,000 people (or more) observed at up to 25 time points. With such a large sample size, we include a lot of predictors that we’re adjusting for – maybe 50 or so fixed effects. I’m fitting the model to a binary outcome using lme4::glmer in R, with random intercepts for each subject. I can't go into specifics on the data, but the basic format of the glmer command I used was:

fit <-  glmer(outcome ~ treatment + study_quarter + dd_quarter + (1|id),
              family = "binomial", data = dat)

where both study_quarter and dd_quarter are factors with about 20 levels each.

When I try to fit this model in R, it runs for about 12-15 hours, and returns an error that it failed to converge. I did a bunch of troubleshooting (e.g., following these guidelines), with no improvement. And the convergence isn’t even close in the end (max gradient around 5-10, whereas the convergence criterion is 0.001 I think).

I then tried fitting the model in Stata, using the melogit command. The model fit in under 2 mins, with no convergence issues. The corresponding Stata command is

melogit outcome treatment i.study_quarter i.dd_quarter || id:

What gives? Does Stata just have a better fitting algorithm, or one better optimized for large models and large datasets? It’s really surprising how different the run times were.

[Here] (https://www.statalist.org/forums/forum/general-stata-discussion/general/1371838-gsem-not-estimating) is an example of the opposite case - R deals quickly with apparently identical SEM model impossible to estimate in Stata.. — radek, Jun 22 '17 at 07:58
I am not sure, but can it be that the default option in glmer for binomial family is probit and not logit? Maybe you could add `family = binomial(link = "logit")` and try it then? — Erdne Htábrob, Jun 22 '17 at 09:49
@radek - Thank you for sharing, but I am referring specifically to mixed effects modeling, not SEMs. I know there are plenty of cases where R "beats" Stata. — Jonathan Gellar, Jun 23 '17 at 20:38
I'm having exactly the same experience as you describe - days of running time and convergence warnings in R with a binomial `glmer` - Stata `melogit` fits within minutes, getting close if not identical results for the same specification — Tom Wagstaff, Nov 03 '22 at 15:46

score 15 · Accepted Answer · answered Jun 23 '17 at 19:13

15

The glmer fit will probably be much faster with the optional argument nAGQ=0L. You have many fixed-effects parameters (20 levels for each of study_quarter and dd_quarter generate a total of 28 contrasts) and the default optimization method (corresponding to nAGQ=1L) puts all of those coefficients into the general nonlinear optimization call. With nAGQ=0L these coefficients are optimized within the much faster penalized iteratively reweighted least squares (PIRLS) algorithm. The default will generally provide a better estimate in the sense that the deviance at the estimate is lower, but the difference is usually very small and the time difference is enormous.

I have a write-up of the differences in these algorithms as a Jupyter notebook nAGQ.ipynb. That writeup uses the MixedModels package for Julia instead of lme4 but the methods are similar. (I am one of the authors of lme4 and the author of MixedModels.)

If you are going to be doing a lot of GLMM fits I would consider doing so in Julia with MixedModels. It is often much faster than R, even with all the complicated code in lme4.

answered Jun 23 '17 at 19:13

Douglas Bates

476
3
5

2

I tried setting `nAGQ` to 0, and the model fit in about 5 mins, with no convergence problems. Still a bit slower than Stata, but more than acceptable. – Jonathan Gellar Jun 23 '17 at 20:48
1

I wonder if a model with the default `nAGQ=1L` would converge faster if `lmer` first fit the model with PIRLS (corresponding to `nAGQ=0`) to get better starting values, and then refined the model with the Laplace approximation (`nAGQ=1`)? Maybe that's already being done internally, but if so I'm surprised that after 15 hours of run time, the model was nowhere close to convergence (max gradient around 5-10). – Jonathan Gellar Jun 23 '17 at 20:53
An initial `nAGQ=0L` fit is used to create starting estimates for `nAGQ=1L`. It helps some but not as much as might be expected. The problem is the number of parameters to optimize in the nonlinear optimization routine. You can also try changing the optimizer to `NLOPT_LN_BOBYQA` from the `nloptr` package. – Douglas Bates Jun 24 '17 at 13:14
I gave the wrong link for the [`nAGQ.ipynb`](https://github.com/dmbates/MixedModelsinJulia/blob/master/nAGQ.ipynb), which is corrected here. – Douglas Bates Jun 24 '17 at 13:16
One follow-up question: when you say the estimate is better for `nAGQ=1L` than for PIRLS, are you referring to the fixed effects estimates, the random effects estimates, or both? If my parameters of interest are all fixed effect parameters, am I in any way "safer" in using the PIRLS approximation than if I was interested in the random effects? – Jonathan Gellar Jun 28 '17 at 18:11
By "better" I simply mean a lower value of the Laplace approximation to the deviance. That is, the estimates are closer to the maximum likelihood estimates. – Douglas Bates Jun 30 '17 at 21:52
Game changer - I've actually managed to run the model in R since starting to read this post! – Tom Wagstaff Nov 03 '22 at 16:01

nsring · Answer 2 · 2017-06-22T16:12:39.410

0

Are you sure that Stata is reading in the whole file?

http://www.stata.com/manuals13/rlimits.pdf

The reason I ask is because it sounds to me like you've got 50k persons observed 25 times (1,250,000 rows); depending on the version of Stata you're using you could be getting truncated results.

EDIT Since it's not a file length issue have you tried the other packages for mixed effects like nlme? I suspect that the non-linear mixed effects model would take your data somewhat faster.

EDIT This resource may be more helpful than anything about the different models: https://stats.stackexchange.com/questions/173813/r-mixed-models-lme-lmer-or-both-which-one-is-relevant-for-my-data-and-why

edited Jun 22 '17 at 16:12

answered Jun 21 '17 at 13:31

nsring

11
5

Source is 2 versions out of date. Stata 15 document at http://www.stata.com/manuals/rlimits.pdf Either way, 1 million observations (rows, in your terms) is not itself an issue in any serious Stata. – Nick Cox Jun 21 '17 at 13:45
I have not tried nlme, and I agree it's worth a shot. But I doubt that "somewhat faster" would be the difference between 15 hrs (without convergence) and 2 mins (with convergence). I imagine the fitting algorithms would share a lot in common, due to the overlap in authors, but that is just an assumption. – Jonathan Gellar Jun 21 '17 at 18:12
And to confirm what @NickCox said, the model output says that it is using all the observations in the dataset, so I don't think this is an issue. – Jonathan Gellar Jun 21 '17 at 20:34
@nsring - Regarding the link in your second edit, this thread focuses on differences between an lmer fit using REML and a lme fit using ML. Once both models were fit with REML, the two approaches gave near-identical results. Since I have a binomial outcome, I am using glmer, which uses ML (see http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#reml-for-glmms). Stata also uses ML, so this cannot be the difference in their fits. – Jonathan Gellar Jun 23 '17 at 14:58

lme4::glmer vs. Stata's melogit command

2 Answers2

Linked