glmer() converges with one factor level as baseline, but fails to converge when re-level baseline to the other level

Question

Background: I am fitting a logistic mixed model with two fixed effects, and two crossed random effects. The two variables making up the fixed effects are both binary categorical variables (i.e., there are no numeric predictors).

The data are from an eye tracker, and each subject has thousands of 0s and 1s over time. One model I wanted to consider is a logistic mixed model, accounting for correlation of the 0s and 1s within an individual, and within an image an individual is looking at at a given time (over the entire experiment, they look at six different images, out of 12 possible). (As an aside, I do plan to also analyze these data with a growth curve analysis, but was also curious to try this, ignoring the time course temporarily.)

Issue: I ran the model using dummy coding of both binary predictors, with the default levels of baseline for each. The model converged. It also made sense based on plots of the data I have already made: estimates were in in the expected directions, significance of fixed effects was as expected, etc.

Then, I wanted to fit the model reparameterized with one of the predictors at its other level of baseline in order to get the associated p-value (see, e.g.: https://stats.stackexchange.com/questions/449723/calculating-simple-effects-from-standard-regression-output). I used relevel() to do this, then fit the exact same model again. However, this time there was a convergence warning. I tried it again (because why not), and it failed again. I cannot come up with a reason why the baseline level should affect convergence (or lack thereof).

Data structure: Here is a small dummy dataset showing the general structure of the data (much reduced from the actual dataset, which has ~5000 observations (0s and 1s) per individual per image, and is over 1 million lines long in total (there are ~60 subjects in the actual dataset)). Note Time_stamp is not in the model here, as I am not considering the time course in this analysis. "AOI" in AOI_look stands for "area of interest," a common term in eye tracking data.

structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Image = c("A", "A", 
"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E", 
"F", "F", "F", "G", "G", "G", "E", "E", "E", "D", "D", "D", "A", 
"A", "A", "J", "J", "J", "L", "L", "L"), Time_stamp = c(0L, 1L, 
10L, 0L, 8L, 16L, 0L, 7L, 16L, 0L, 1L, 10L, 0L, 8L, 16L, 0L, 
7L, 16L, 0L, 8L, 16L, 0L, 7L, 16L, 0L, 1L, 9L, 0L, 8L, 16L, 0L, 
8L, 16L, 0L, 1L, 9L), Intervention = c("Pre", "Pre", "Pre", "Pre", 
"Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post", "Post", 
"Post", "Post", "Post", "Post", "Post", "Pre", "Pre", "Pre", 
"Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post", 
"Post", "Post", "Post", "Post", "Post", "Post"), Sex = c("Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male"), AOI_look = c(1L, 
1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -36L))

Model 1: Intervention = Post as baseline: No convergence warning

logit_model01 <- glmer(AOI_look ~ factor(Sex)*factor(Intervention) + 
                                  (1 | Image) + 
                                  (1 | Subject), 
                       data = data01, 
                       family="binomial")

Output from the Model 1 (converged):

summary(logit_model01)

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial  ( logit )
Formula: AOI_look ~ factor(Sex) * factor(Intervention) + (1 | Image) +  
         (1 | Subject)
Data: data01

      AIC       BIC    logLik  deviance  df.resid 
1519023.6 1519095.5 -759505.8 1519011.6   1182007 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6200 -0.9733  0.5504  0.8444  2.2317 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 0.2320   0.4816  
 Image    (Intercept) 0.1318   0.3631  
Number of obs: 1182013, groups:  Subject, 59; Image, 9

Fixed effects:
                                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)                               0.052133   0.036172   1.441     0.15    
factor(Sex)Male                           0.187736   0.039967   4.697 2.64e-06 ***
factor(Intervention)Pre                   0.257262   0.006136  41.924  < 2e-16 ***
factor(Sex)Male:factor(Intervention)Pre  -0.220240   0.007932 -27.766  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
              (Intr) fc(S)M fc(I)P
fctr(Sex)M    -0.006               
factr(Int)Pr   0.011  0.012        
f(S)M:(I)P    -0.013 -0.018  -0.747

Model 2: Same as Model 1, but Intervention = Pre was set to baseline before running: Get convergence warning:

logit_model02 <- glmer(AOI_look ~ factor(Sex)*factor(Intervention) + 
                                  (1 | Image) + 
                                  (1 | Subject), 
                       data = data01, 
                       family="binomial")

Convergence warning:

Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00210935 (tol = 0.002, component 1)

Although there was a warning, I am still able to get the output for Model 2 with summary():

summary(logit_model02)

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial  ( logit )
Formula: AOI_look ~ factor(Sex) * factor(Intervention) + (1 | Image) +  
         (1 | Subject)
Data: data01

      AIC       BIC    logLik  deviance  df.resid 
1519023.6 1519095.5 -759505.8 1519011.6   1182007 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6200 -0.9733  0.5504  0.8444  2.2317 

Random effects:
 Groups    Name        Variance Std.Dev.
 Subject   (Intercept) 0.2320   0.4816  
 Image     (Intercept) 0.1318   0.3631  
Number of obs: 1182013, groups:  Subject, 59; Image, 9

Fixed effects:
                                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)                               0.309400   0.038106   8.120 4.68e-16 ***
factor(Sex)Male                          -0.032522   0.042520  -0.765    0.444    
factor(Intervention)Post                 -0.257262   0.006158 -41.774  < 2e-16 ***
factor(Sex)Male:factor(Intervention)Post  0.220234   0.007926  27.787  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
              (Intr) fc(S)M fc(I)P
fctr(Sex)M    -0.107               
fctr(Int)Pst   0.009  0.024        
f(S)M:(I)P    -0.013 -0.025  -0.748
convergence code: 0
Model failed to converge with max|grad| = 0.00210935 (tol = 0.002, component 1)

Added information found in comments below Ben Bolker's answer, for clarity/completeness:

Model 3: Same as Model 1, but on aggregated data (0s and 1s summarized into successes and failures by Subject and Image combination):

Results are not the same as from Model 1 (fit on non-aggregated, 0/1 data), even though Model 1 had no convergence warning. Namely, all point estimates are similar, but not so for all SEs: The SEs for the interaction and Intervention (within-subjects) are similar, but those for Sex (between-subjects) and the intercept are markedly different:

logit_model03 <- glmer(cbind(Successes, Failures) ~ factor(Sex)*factor(Intervention) +
                                                    (1 | Image) + 
                                                    (1 | Subject), 
                       data = data01_agg, 
                       family="binomial")

summary(logit_model03)

Fixed effects:
                                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)                              0.052151   0.150564   0.346    0.729    
factor(Sex)Male                          0.187734   0.125563   1.495    0.135    
factor(Intervention)Pre                  0.257261   0.006227  41.314   <2e-16 ***
factor(Sex)Male:factor(Intervention)Pre -0.220239   0.008061 -27.322   <2e-16 ***

To see if the sheer size of the raw dataset (over 1.67 million lines) could be causing an issue, I took a random sample of 15000 observations from my original dataset and refit the model with both the non-aggregated (0/1) and aggregated (success, failure) data. (I did this for both levels of baseline for Intervention, but am just showing that for Intervention = Post here.):

Model 4: Random sample of 15000 obs. from actual dataset, non-aggregated (0/1) outcome, Intervention = Post as baseline:

logit_model04 <- glmer(AOI_look ~ factor(Sex)*factor(Intervention) + 
                                  (1 | Image) + 
                                  (1 | Subject), 
                       data = smaller_raw_data, 
                       family="binomial")

summary(logit_model04)

Fixed effects:
                                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)                             -0.03212    0.15564  -0.206 0.836505    
factor(Sex)Male                          0.28384    0.14121   2.010 0.044429 *  
factor(Intervention)Pre                  0.22727    0.06449   3.524 0.000425 ***
factor(Sex)Male:factor(Intervention)Pre -0.25061    0.08410  -2.980 0.002884 **

Model 5: Random sample of 15000 obs. from actual dataset, aggregated (success, failure) outcome, Intervention = Post as baseline:

logit_model05 <- glmer(cbind(Successes, Failures) ~ factor(Sex)*factor(Intervention) + 
                                                    (1 | Image) + 
                                                    (1 | Subject), 
                       data = smaller_agg_data, 
                       family="binomial")

summary(logit_model05)

Fixed effects:
                                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)                             -0.03211    0.15569  -0.206 0.836582    
factor(Sex)Male                          0.28383    0.14126   2.009 0.044500 *  
factor(Intervention)Pre                  0.22727    0.06450   3.524 0.000425 ***
factor(Sex)Male:factor(Intervention)Pre -0.25061    0.08412  -2.979 0.002890 **

As you can see, the point estimates and SEs for the fixed effects for models 4 and 5 are the same until at least the 5th decimal place. I also repeated this for Intervention = Pre as baseline, and the results are equivalent with regard to their similarity across the non-aggregated and aggregated models (as I expected would be the case).

Thus, using non-aggregated or aggregated data shouldn't matter in theory, at least for a reasonably-sized (i.e., not-too-large) dataset. This leads me to wonder if there is instability in the models when the dataset gets "too large." From ?lme4::convergence, there is reason to believe this could be the case: “Exploratory analyses suggest that (1) the naive estimation of the Hessian may fail for large data sets (number of observations greater than approximately 1e5).” So maybe this is the root of what I observed in models 1-3 above.

In my case, I have the option to aggregate the data since both of my predictors are binary categorical, but for numeric predictors that don't allow aggregating, this would not be an option to reduce the computational load prior to fitting the model.

1/2 @Ben Bolker - this is directly related to the question to which you just responded (https://stackoverflow.com/questions/63360751/specifying-random-effects-for-repeated-measures-in-logistic-mixed-model-in-r-lm/63367680#63367680). I actually wrote that question to "support" this one, in case it was a problem with how I coded the model. Since, with my other question, you felt the code I was using was reasonable, I wonder if you have any inclination as to why this is happening. — Meg, Aug 12 '20 at 00:06
2/2 I will try it with adding `(1|Subject:Trial)` as a random effect and see if that resolves it. Note this *is* the exact same scenario as https://stackoverflow.com/questions/63360751/specifying-random-effects-for-repeated-measures-in-logistic-mixed-model-in-r-lm/63367680#63367680, I just mentioned here that it's eye tracking data, and so the outcome is called *AOI_look* instead of *Binary_outcome*. Thanks again for your help. (Also, should this question be migrated to CrossValidated too, or is it alright here? To me, this question was more clearly about coding.) — Meg, Aug 12 '20 at 00:07
Think I may be onto something: When I run the model on the aggregated instead of raw data, I get very similar parameter estimates (same until the 5th or 6th decimal place) to the model above that converged (first one, with `Post` as baseline), but the *p*-value for *Sex* is very different (0.135 vs. the 2.64e-06 given above), as well as for the intercept term. This makes me wonder if lmer() is "timing out" (or, in the latter model above, having a convergence problem) due to the size of the raw data (~1.6 million lines), and thus giving inaccurate inference. Working on it... Will report back. — Meg, Aug 13 '20 at 20:59
Currently looking at https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html, but think I need to increase memory if possible and try letting the model run longer. — Meg, Aug 13 '20 at 21:21

Ben Bolker · Answer 1 · 2020-08-14T00:35:39.450

don't take convergence checks as the be-all and end-all

Although it is certainly theoretically true that adjustments like changing the order of random terms, releveling factors, centering/scaling/etc., shouldn't change the outcome of model fitting at all, they do change the linear algebra slightly and so can tip the model diagnostics across a threshold where it reports non-convergence. The main thing to keep in mind is that warnings of model convergence need to be taken with a small grain of salt: they don't necessarily mean anything is wrong with the model, just that you should double-check. In fact, if you have looked carefully at the summary output (e.g. comparing log-likelihood and random-effect variances, which should be nearly identical) and the predictions (ditto), then you can conclude that the convergence warning doesn't really matter. Here is my code to check some simple predictions:

cc1 <- c(0.052133, 0.187736, 0.257262, -0.220240)
cc2 <- c(0.309400,-0.032522,-0.257262,0.220234)

df2 <- expand.grid(Sex=c("Female","Male"),Intervention=c("Pre","Post"))
df1 <- transform(df2,Intervention=relevel(Intervention,"Post"))


X1 <- model.matrix(~Sex*Intervention, data=df1)
X2 <- model.matrix(~Sex*Intervention, data=df2)
cbind(drop(X1 %*% cc1), drop(X2 %*% cc2))
##       [,1]     [,2]
## 1 0.309395 0.309400
## 2 0.276891 0.276878
## 3 0.052133 0.052138
## 4 0.239869 0.239850

all.equal(drop(X1 %*% cc1), drop(X2 %*% cc2))
## [1] "Mean relative difference: 4.78203e-05"

These differ at the level of about 4e-05 so unless you expect and need predictions to that level of accuracy I wouldn't worry about it. (Given what you said in the comments below about differences in standard error calculations, I might also try merTools::predictInterval() to look for differences in uncertainty.) See also: ?lme4::convergence, ?lme4::troubleshooting, ?lme4::allFit

don't bother releveling and refitting

Since all the level adjustments can be done by straightforward linear algebra, there are a variety of convenient downstream tools that let you make comparisons like the one you want without releveling and refitting. This is much faster and more convenient! I would start with the emmeans package; the effects package may also be useful.

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/219863/discussion-on-answer-by-ben-bolker-glmer-converges-with-one-factor-level-as-ba). — Samuel Liew, Aug 15 '20 at 10:05

score -1 · Answer 2 · answered Aug 10 '20 at 07:50

-1

Reparameterising one factor to have a different reference level should make no difference, both to the convergence of the model, nor the interpretation of its results.

Deasling with the results first, Intervention is a two level factor. Therefore, the effect or reordering it so that Pre is the reference level (clearly a sensible thing to do) is to change the sign of the corresponding parameter estimate. (So, in your results, 0.257262 changes to -0.257262.) Nothing else should change: p-values, std errors, etc all remain the same. Not just for intervention, but every other term in the model.

The fact that not only do things change but the model fails to converge suggests there is something wrong with your model.

logit_model <- glmer(AOI_look ~ factor(Sex)*factor(Intervention) + 
                                (1 | Image) + 
                                (1 | Subject)

I think there may be several issues here. Shouldn't Intervention be a within subject effect? (You can't randomise the order of intervention within subject.) Shouldn't Image be a between subject effect? (Image "A" shown to Patient 1 is the same Image "A" shown to Patient 2.) There may be a case for also incluing a within-patient Image effect.

Fitting longitudinal mixed effects models is a complex task. I think you may need to take detailed expert advice about your specific case, both to understand what model is appropriate to your study and how to fit it. The fact that a model converges does not mean it is the correct model.

answered Aug 10 '20 at 07:50

Limey

10,234
2
12
32

1/9 @Limey Thanks for your comments. Note I was not assuming it was the correct model because it converged, but rather because I thought for a long time about each variable that should be included, accounted for two levels of correlation (Subject and Trial), and the model gave reasonable estimates and p-values based on what the data have showed me in plots (which I discuss across many comments below, as the response is lengthy). – Meg Aug 10 '20 at 13:37
2/9 You are correct about Intervention - it is within-subject. I didn’t/don’t know how to further account for this other than by specifying a random effect for participant and trial. Do you have a suggestion? – Meg Aug 10 '20 at 13:37
3/9 And yes, trail A is always trial A, e.g., but Trial itself is not a variable of interest, which is why I didn’t fit it as a fixed effect, only random. I simply wanted to account for the fact that when a given participant looks at a certain image, he or she may be more or less interested in that image than another, so there may be additional cluster correlation due to image, above and beyond the correlation already present within an individual. Additional thoughts on how to better/appropriately model/account for this? – Meg Aug 10 '20 at 13:37
4/9 Here is the remainder of my response to what you wrote, which is continues to span multiple comments: Changing the baseline affects more than the sign of the estimate for that parameter, due to the interaction. As you say, for Intervention, the magnitude will stay the same and only the sign will change. However, all other estimates change (to this end, I updated my post to show the output of the reparameterized model with Intervention = Pre as baseline). This matters because I want to compare the estimated ln(odds) across different combinations of the two binary predictors. – Meg Aug 10 '20 at 13:38
5/9 This is why I even wanted to change baseline - although I can get all *estimates* of the ln(odds) I could ever want from first model with algebra, I want the p-value for Sex for both pre- and post-intervention. For example, with the first parameterization (“Post” as baseline), I can draw the following conclusion: For Intervention = Post (baseline), there is a significant difference in the ln(odds) between males and females, according to the p-value associated with Sex (2.64e-06 < 0.0001). – Meg Aug 10 '20 at 13:38
6/9 Now, I want to change the baseline of Intervention to “Pre” so I can get the equivalent associated p-value between males and females, but for *pre*-intervention. Plots of my data say there shouldn’t be a significant difference between the sexes pre-intervention, and – even though I got that convergence error – so does the summary() output of the model with “Pre” as baseline (p = 0.444 for Sex; again, I updated my post to show this output). – Meg Aug 10 '20 at 13:38
7/9 Also, the estimate of the ln(odds) I get from the second model for comparing males to females when Intervention = Pre (baseline) is the same as what I calculate from my first model using algebra. Namely, in the second model, the estimate for Sex (which is the difference in the ln(odds) between males and females when holding Intervention constant at “Pre” (baseline)) is -0.032522. – Meg Aug 10 '20 at 13:38
8/9 This can also be found from the first model: For pre-intervention, comparing males to females, the interaction now comes into play: beta1^ + beta3^ = 0.187736 + (-0.220240) = -0.032504 (assuming the values off only due to rounding error). – Meg Aug 10 '20 at 13:38
9/9 Thus, even with the failed convergence in the second parameterization, the output given by summary() for the both models is reasonable/what I expected based on graphs of my data, and thus I would like to think I’m on the right track, but may need to specify the random effects slightly differently, e.g. – Meg Aug 10 '20 at 13:38
You can do all of this **much more easily** with the emmeans package ... – Ben Bolker Aug 13 '20 at 21:02
So the `emmeans` package *is* pretty nice ;) I still like doing the algebra for the various models though, haha. But `emmeans` is definitely more efficient. Thanks for the suggestion. – Meg Aug 15 '20 at 15:05

glmer() converges with one factor level as baseline, but fails to converge when re-level baseline to the other level

2 Answers2

don't take convergence checks as the be-all and end-all

don't bother releveling and refitting