issues with data size in glmer in lme4 in R: size of data set causing convergence issues

Question

I'm trying to model the effect of several variables on the likelihood of a self-loop occurring using glmer in the lme4 package. It's a very large data set with >900,000 data points.

When I try to run the model I get this error.

SLMod <- glmer(SL ~ species*season + (1|code), data=SL, family=binomial)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
Model failed to converge with max|grad| = 0.0013493 (tol = 0.001, 
component 1)

And this is the output

summary(SLMod)
Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation) ['glmerMod']
Family: binomial  ( logit )
Formula: SL ~ species * season + (1 | code)
Data: SL

  AIC       BIC    logLik  deviance  df.resid 
708076.5  708135.1 -354033.2  708066.5    906441 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
 -1.6224 -0.4324 -0.3136 -0.1983  5.0722 

Random effects:
  Groups Name        Variance Std.Dev.
  code   (Intercept) 0.8571   0.9258  
 Number of obs: 906446, groups:  code, 180

 Fixed effects:
                                    Estimate Std. Error z value Pr(>|z|)    
 (Intercept)                             -1.29729    0.05944 -21.824  < 2e-16 ***
speciesSilvertip Shark                   0.05593    0.06390   0.875    0.381    
 seasonwet season                         0.09617    0.01008   9.537  < 2e-16 ***
 speciesSilvertip Shark:seasonwet season -0.10809    0.01354  -7.983 1.43e-15 ***
 ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
             (Intr) spcsSS ssnwts
 spcsSlvrtpS -0.585              
 seasonwtssn  0.009 -0.004       
 spcsSShrk:s -0.007  0.001 -0.744
 convergence code: 0
 Model failed to converge with max|grad| = 0.0013493 (tol = 0.001, component 1)

It's a data set of animal movements with consecutive detection's at the same point with a time difference calculated. If the time difference is >10 mins this has been determined as a self-loop and given a 1, if under ten minutes a 0. Sample of the data is below.

structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = 
"2388", class = "factor"), 
species = c("Silvertip Shark", "Silvertip Shark", "Silvertip Shark", 
"Silvertip Shark", "Silvertip Shark", "Silvertip Shark"), 
sex = c("F", "F", "F", "F", "F", "F"), TL = c(112, 112, 112, 
112, 112, 112), datetime = structure(c(1466247120, 1466247420, 
1467026100, 1469621400, 1469879640, 1470397200), class = c("POSIXct", 
"POSIXt"), tzone = ""), year = c("2016", "2016", "2016", 
"2016", "2016", "2016"), month = c(6, 6, 6, 7, 7, 8), hour = c(11, 
11, 12, 13, 12, 12), season = c("dry season", "dry season", 
"dry season", "dry season", "dry season", "dry season"), 
daynight = c("day", "day", "day", "day", "day", "day"), SL = c(0, 
0, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")

I randomly sampled my dataset for just 50% of the data using this code

SL50 <- SL %>% sample_frac(0.5)

And ran the same code on this data set and it ran fine with no errors. I was wondering if there is an issue with the size of the data set I'm running. However, I get a similar error with a different model using the 50 % sampled data, which disappears when I run that code on 10% of the data.

SLMod <- glmer(SL ~ species*daynight + (1|code), data=SL50, 
family=binomial)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
Model failed to converge with max|grad| = 0.0010195 (tol = 0.001, 
component1)

Is it possible there's an issue with the size of the data it's trying to process for each model? And are there any ways to deal with this?

Yes, code is animal ID. There are multiple animals as part of the data set so have included that as a random effect. — mikejwilliamson, Apr 26 '19 at 07:52

score 1 · Answer 1 · answered Apr 26 '19 at 15:39

I am going to start by saying that I do not understand the theory enough behind these models to give you a thorough answer, but in experimenting with some data I have found a difference that may be helpful. Sometimes playing around with hypothetical examples can help you understand your issue.

Here I have made up some data. There are three sets of random binomial data with different probabilities of being a 1. Sharks have a 0.1 probability, Turtles have 0.7, and Gators 0.9. But notice that I have repeated "night" and "day" over and over again throughout the dataset. So there should be no real difference between the two.

data<-data.frame("X"=c(rbinom(100,1, 0.1),rbinom(100,1, 0.7),rbinom(100,1, 0.9)),
                       "species"=c(rep("Shark",100),rep("Turtle",100),rep("Gator",100)),
                 "daynight"=c("night","day"),"ID"=as.factor(c(1:300)))

> head(data)
  X species daynight ID
1 0   Shark    night  1
2 0   Shark      day  2
3 1   Shark    night  3
4 0   Shark      day  4
5 0   Shark    night  5
6 0   Shark      day  6

library(lme4)

mod1<-glmer(X~species*daynight+(1|ID), data=data, family="binomial"(link="logit"))  
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

mod2<-glmer(X~species+(1|ID), data=data, family="binomial"(link="logit"))

I get the warning message when I include daynight, perhaps because there isn't enough variance for the model to converge, and I get no error for mod2.

Now I have changed it so that there is a 'direction' of day/night. Notice that day is always higher in this hypothetical case.

data<-data.frame("X"=c(rbinom(50,1, 0.1),rbinom(50,1, 0.3),
                       rbinom(50,1, 0.7),rbinom(50,1, 0.9),
                       rbinom(50,1, 0.5),rbinom(50,1, 0.6)),
                       "species"=c(rep("Shark",100),rep("Turtle",100),rep("Gator",100)),
                 "daynight"=c(rep("night",50),rep("day",50)),"ID"=as.factor(c(1:300)))

> head(data)
  X species daynight ID
1 0   Shark    night  1
2 0   Shark    night  2
3 0   Shark    night  3
4 0   Shark    night  4
5 0   Shark    night  5
6 0   Shark    night  6

mod1<-glmer(X~species*daynight+(1|ID), data=data, family="binomial"(link="logit"))

Here I get no such error when I run the same mod1 This may be because there is more variance in the daynight term, but someone else will need to confirm theoretically what is going on here.

A simple solution may be to remove one of your variables, either species or daynight from the overall model, or perhaps you can include other environmental variables or day/time information that will help it converge.

I know this isn't thorough, but hopefully it will help you start to play around with some of these hypothetical datasets to understand why it isn't converging for you.

OK great thanks for taking the time out to work on that. I'll have a further play and see what I can come up with. — mikejwilliamson, Apr 29 '19 at 07:18

issues with data size in glmer in lme4 in R: size of data set causing convergence issues

1 Answers1