I'm trying to model the effect of several variables on the likelihood of a self-loop occurring using glmer in the lme4 package. It's a very large data set with >900,000 data points.
When I try to run the model I get this error.
SLMod <- glmer(SL ~ species*season + (1|code), data=SL, family=binomial)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0013493 (tol = 0.001,
component 1)
And this is the output
summary(SLMod)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: SL ~ species * season + (1 | code)
Data: SL
AIC BIC logLik deviance df.resid
708076.5 708135.1 -354033.2 708066.5 906441
Scaled residuals:
Min 1Q Median 3Q Max
-1.6224 -0.4324 -0.3136 -0.1983 5.0722
Random effects:
Groups Name Variance Std.Dev.
code (Intercept) 0.8571 0.9258
Number of obs: 906446, groups: code, 180
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.29729 0.05944 -21.824 < 2e-16 ***
speciesSilvertip Shark 0.05593 0.06390 0.875 0.381
seasonwet season 0.09617 0.01008 9.537 < 2e-16 ***
speciesSilvertip Shark:seasonwet season -0.10809 0.01354 -7.983 1.43e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) spcsSS ssnwts
spcsSlvrtpS -0.585
seasonwtssn 0.009 -0.004
spcsSShrk:s -0.007 0.001 -0.744
convergence code: 0
Model failed to converge with max|grad| = 0.0013493 (tol = 0.001, component 1)
It's a data set of animal movements with consecutive detection's at the same point with a time difference calculated. If the time difference is >10 mins this has been determined as a self-loop and given a 1, if under ten minutes a 0. Sample of the data is below.
structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
"2388", class = "factor"),
species = c("Silvertip Shark", "Silvertip Shark", "Silvertip Shark",
"Silvertip Shark", "Silvertip Shark", "Silvertip Shark"),
sex = c("F", "F", "F", "F", "F", "F"), TL = c(112, 112, 112,
112, 112, 112), datetime = structure(c(1466247120, 1466247420,
1467026100, 1469621400, 1469879640, 1470397200), class = c("POSIXct",
"POSIXt"), tzone = ""), year = c("2016", "2016", "2016",
"2016", "2016", "2016"), month = c(6, 6, 6, 7, 7, 8), hour = c(11,
11, 12, 13, 12, 12), season = c("dry season", "dry season",
"dry season", "dry season", "dry season", "dry season"),
daynight = c("day", "day", "day", "day", "day", "day"), SL = c(0,
0, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
I randomly sampled my dataset for just 50% of the data using this code
SL50 <- SL %>% sample_frac(0.5)
And ran the same code on this data set and it ran fine with no errors. I was wondering if there is an issue with the size of the data set I'm running. However, I get a similar error with a different model using the 50 % sampled data, which disappears when I run that code on 10% of the data.
SLMod <- glmer(SL ~ species*daynight + (1|code), data=SL50,
family=binomial)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0010195 (tol = 0.001,
component1)
Is it possible there's an issue with the size of the data it's trying to process for each model? And are there any ways to deal with this?