0

I'm trying to model the effect of several variables on the likelihood of a self-loop occurring using glmer in the lme4 package. It's a very large data set with >900,000 data points.

When I try to run the model I get this error.

SLMod <- glmer(SL ~ species*season + (1|code), data=SL, family=binomial)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
Model failed to converge with max|grad| = 0.0013493 (tol = 0.001, 
component 1)

And this is the output

summary(SLMod)
Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation) ['glmerMod']
Family: binomial  ( logit )
Formula: SL ~ species * season + (1 | code)
Data: SL

  AIC       BIC    logLik  deviance  df.resid 
708076.5  708135.1 -354033.2  708066.5    906441 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
 -1.6224 -0.4324 -0.3136 -0.1983  5.0722 

Random effects:
  Groups Name        Variance Std.Dev.
  code   (Intercept) 0.8571   0.9258  
 Number of obs: 906446, groups:  code, 180

 Fixed effects:
                                    Estimate Std. Error z value Pr(>|z|)    
 (Intercept)                             -1.29729    0.05944 -21.824  < 2e-16 ***
speciesSilvertip Shark                   0.05593    0.06390   0.875    0.381    
 seasonwet season                         0.09617    0.01008   9.537  < 2e-16 ***
 speciesSilvertip Shark:seasonwet season -0.10809    0.01354  -7.983 1.43e-15 ***
 ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
             (Intr) spcsSS ssnwts
 spcsSlvrtpS -0.585              
 seasonwtssn  0.009 -0.004       
 spcsSShrk:s -0.007  0.001 -0.744
 convergence code: 0
 Model failed to converge with max|grad| = 0.0013493 (tol = 0.001, component 1)

It's a data set of animal movements with consecutive detection's at the same point with a time difference calculated. If the time difference is >10 mins this has been determined as a self-loop and given a 1, if under ten minutes a 0. Sample of the data is below.

structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = 
"2388", class = "factor"), 
species = c("Silvertip Shark", "Silvertip Shark", "Silvertip Shark", 
"Silvertip Shark", "Silvertip Shark", "Silvertip Shark"), 
sex = c("F", "F", "F", "F", "F", "F"), TL = c(112, 112, 112, 
112, 112, 112), datetime = structure(c(1466247120, 1466247420, 
1467026100, 1469621400, 1469879640, 1470397200), class = c("POSIXct", 
"POSIXt"), tzone = ""), year = c("2016", "2016", "2016", 
"2016", "2016", "2016"), month = c(6, 6, 6, 7, 7, 8), hour = c(11, 
11, 12, 13, 12, 12), season = c("dry season", "dry season", 
"dry season", "dry season", "dry season", "dry season"), 
daynight = c("day", "day", "day", "day", "day", "day"), SL = c(0, 
0, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")

I randomly sampled my dataset for just 50% of the data using this code

SL50 <- SL %>% sample_frac(0.5)

And ran the same code on this data set and it ran fine with no errors. I was wondering if there is an issue with the size of the data set I'm running. However, I get a similar error with a different model using the 50 % sampled data, which disappears when I run that code on 10% of the data.

SLMod <- glmer(SL ~ species*daynight + (1|code), data=SL50, 
family=binomial)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
Model failed to converge with max|grad| = 0.0010195 (tol = 0.001, 
component1) 

Is it possible there's an issue with the size of the data it's trying to process for each model? And are there any ways to deal with this?

mikejwilliamson
  • 405
  • 1
  • 7
  • 17

1 Answers1

1

I am going to start by saying that I do not understand the theory enough behind these models to give you a thorough answer, but in experimenting with some data I have found a difference that may be helpful. Sometimes playing around with hypothetical examples can help you understand your issue.

Here I have made up some data. There are three sets of random binomial data with different probabilities of being a 1. Sharks have a 0.1 probability, Turtles have 0.7, and Gators 0.9. But notice that I have repeated "night" and "day" over and over again throughout the dataset. So there should be no real difference between the two.

data<-data.frame("X"=c(rbinom(100,1, 0.1),rbinom(100,1, 0.7),rbinom(100,1, 0.9)),
                       "species"=c(rep("Shark",100),rep("Turtle",100),rep("Gator",100)),
                 "daynight"=c("night","day"),"ID"=as.factor(c(1:300)))

> head(data)
  X species daynight ID
1 0   Shark    night  1
2 0   Shark      day  2
3 1   Shark    night  3
4 0   Shark      day  4
5 0   Shark    night  5
6 0   Shark      day  6

library(lme4)

mod1<-glmer(X~species*daynight+(1|ID), data=data, family="binomial"(link="logit"))  
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

mod2<-glmer(X~species+(1|ID), data=data, family="binomial"(link="logit"))

I get the warning message when I include daynight, perhaps because there isn't enough variance for the model to converge, and I get no error for mod2.

Now I have changed it so that there is a 'direction' of day/night. Notice that day is always higher in this hypothetical case.

data<-data.frame("X"=c(rbinom(50,1, 0.1),rbinom(50,1, 0.3),
                       rbinom(50,1, 0.7),rbinom(50,1, 0.9),
                       rbinom(50,1, 0.5),rbinom(50,1, 0.6)),
                       "species"=c(rep("Shark",100),rep("Turtle",100),rep("Gator",100)),
                 "daynight"=c(rep("night",50),rep("day",50)),"ID"=as.factor(c(1:300)))

> head(data)
  X species daynight ID
1 0   Shark    night  1
2 0   Shark    night  2
3 0   Shark    night  3
4 0   Shark    night  4
5 0   Shark    night  5
6 0   Shark    night  6

mod1<-glmer(X~species*daynight+(1|ID), data=data, family="binomial"(link="logit"))

Here I get no such error when I run the same mod1 This may be because there is more variance in the daynight term, but someone else will need to confirm theoretically what is going on here.

A simple solution may be to remove one of your variables, either species or daynight from the overall model, or perhaps you can include other environmental variables or day/time information that will help it converge.

I know this isn't thorough, but hopefully it will help you start to play around with some of these hypothetical datasets to understand why it isn't converging for you.

Dylan_Gomes
  • 2,066
  • 14
  • 29