problems with model convergence: Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals,

Question

I'm using lme4 to build a collaborative filter and running into convergence issues. Trying to solve via the following resources and getting a new error:

Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals, :
number of items to replace is not a multiple of replacement length

This after the model was running towards convergence for ~ 48 hours.

https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html
https://stats.stackexchange.com/questions/242109/model-failed-to-converge-warning-in-lmer

note: optimx nlmimb seems best, then L-BFGS-B

I have a model that's structured as follows:

library(lme4); library(optimx)
library(stringi)
library(data.table)

set.seed(1423L)
# highly imbalanced outcome variable
y <- sample.int(2L, size= 910000, replace=T, prob= c(0.98, 0.02)) - 1L
# product biases
prod <- sample(letters, size= 910000, replace=T)
#  user biases
my_grps <- stringi::stri_rand_strings(n= 35000, length= 10)
grps <- rep(my_grps, each= 26)
x1 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x2 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x3 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x4 <- sample(LETTERS[1:5], size= 91000, replace=T)

dt <- data.table(y= y,
             prod= prod, grps= grps,
             x1= x1, x2= x2, x3= x3, x4= x4)

lmer1 <- glmer(y ~ -1 + prod + x1 + x2 + x3 + x4 + (1|grps),
               data= dt, family= binomial(link= "logit"),
               control = glmerControl(optimizer ='optimx', optCtrl=list(method='nlminb')))

I haven't guaranteed that the above data reproduces the error; but that's the model setup. I don't understand the error message at all. Any help would be appreciated

Reported as lmer issue #425

NOTE: in my true use case, I have closer to 15.5M observations and 30-50 products where each product has a different average response rate (y)

I have also switched from a kNN approach (typical collaborative filter) to HLM because R is terribly optimized for kNN at scale--should use something like annoy, which I have yet to try out.

Just a comment on trying to reproduce your example, all your calls to `sample.int` include a `probs` argument but the argument in the base version of `sample.int` is `prob` (i.e. no `s`) — Douglas Bates, Jun 23 '17 at 18:14
There are no problems fitting such a model using the [`MixedModels`](https://github.com/dmbates/MixedModels.jl) package for Julia — Douglas Bates, Jun 23 '17 at 18:30
I'm going to have a little bit of trouble tracking this down because `ans.ret` doesn't appear in our code base at all as far as I can tell ... — Ben Bolker, Jun 26 '17 at 16:21
@BenBolker I would **guess** it's in the `optimx` library. But ultimately I'll likely need to switch to either Julia (below) or another method; in my real setting, I can't easily wait for 48-72+ hours for the model to (maybe/maybe not) converge.... I have been told we have an internal tool for CF-type models. The joys of good data infrastructure teams where you work :) — alexwhitworth, Jun 26 '17 at 16:32

score 2 · Accepted Answer · answered Jun 23 '17 at 18:38

```

julia> @time m1 = fit!(glmm(@formula(y ~ 0 + prod + x1 + x2 + x3 + x4 + (1|grps)), dt, Bernoulli()), fast = true, verbose=true)
f_1: 180528.00386 [1.0]
f_2: 187488.87167 [1.75]
f_3: 177702.14693 [0.25]
f_4: 177671.46777 [0.112452]
f_5: 177676.1792 [0.152245]
f_6: 177667.49847 [0.0374517]
f_7: 177667.32134 [0.0285566]
f_8: 177667.11503 [0.0108968]
f_9: 177667.08367 [0.00339678]
f_10: 177667.08031 [0.000203859]
f_11: 177667.08056 [0.000953859]
f_12: 177667.0803 [1.35223e-7]
f_13: 177667.0803 [7.51352e-5]
f_14: 177667.0803 [7.63522e-6]
f_15: 177667.0803 [8.85223e-7]
f_16: 177667.0803 [0.0]
f_17: 177667.0803 [6.76114e-8]
f_18: 177667.0803 [1.72723e-7]
f_19: 177667.0803 [1.20167e-7]
f_20: 177667.0803 [1.4227e-7]
f_21: 177667.0803 [1.38746e-7]
f_22: 177667.0803 [1.36985e-7]
f_23: 177667.0803 [1.36089e-7]
f_24: 177667.0803 [1.34357e-7]
f_25: 177667.0803 [1.35656e-7]
f_26: 177667.0803 [1.35439e-7]
f_27: 177667.0803 [1.35323e-7]
f_28: 177667.0803 [1.35273e-7]
f_29: 177667.0803 [1.35248e-7]
f_30: 177667.0803 [0.0]
 75.913228 seconds (174.16 k allocations: 19.486 GiB, 8.10% gc time)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: y ~ 0 + prod + x1 + x2 + x3 + x4 + (1 | grps)
  Distribution: Distributions.Bernoulli{Float64}
  Link: GLM.LogitLink()

  Deviance (Laplace approximation): 177667.0803

Variance components:
          Column   Variance Std.Dev.
 grps (Intercept)         0        0

 Number of obs: 910000; levels of grouping factors: 35000

Fixed-effects parameters:
           Estimate Std.Error   z value P(>|z|)
prod: a    -3.82317 0.0402319  -95.0283  <1e-99
prod: b    -3.87486 0.0411777  -94.1009  <1e-99
prod: c     -3.8979 0.0414131  -94.1226  <1e-99
prod: d    -3.90172 0.0416467  -93.6862  <1e-99
prod: e    -3.94375 0.0423702  -93.0786  <1e-99
prod: f    -3.87321 0.0411412  -94.1443  <1e-99
prod: g    -3.84563  0.040832  -94.1818  <1e-99
prod: h    -3.85295 0.0407992  -94.4371  <1e-99
prod: i    -3.86082 0.0408777   -94.448  <1e-99
prod: j    -3.92742 0.0422041  -93.0577  <1e-99
prod: k    -3.90827 0.0417974   -93.505  <1e-99
prod: l    -3.90168 0.0415682   -93.862  <1e-99
prod: m    -3.93383 0.0421348  -93.3629  <1e-99
prod: n    -3.82755 0.0403628  -94.8286  <1e-99
prod: o    -3.89546 0.0416489  -93.5311  <1e-99
prod: p    -3.91643 0.0418437  -93.5966  <1e-99
prod: q    -3.88423 0.0414074  -93.8054  <1e-99
prod: r     -3.9031 0.0416133  -93.7944  <1e-99
prod: s    -3.85363 0.0407327  -94.6079  <1e-99
prod: t    -3.92431 0.0419838   -93.472  <1e-99
prod: u    -3.91551 0.0417962   -93.681  <1e-99
prod: v    -3.92217 0.0417068  -94.0415  <1e-99
prod: w    -3.90503  0.041674  -93.7043  <1e-99
prod: x    -3.81516 0.0402678  -94.7447  <1e-99
prod: y    -3.86918 0.0410894  -94.1648  <1e-99
prod: z    -3.83903 0.0404826  -94.8316  <1e-99
x1        0.0302483 0.0247737   1.22098  0.2221
x2       -0.0311121 0.0253477  -1.22741  0.2197
x3        0.0183217 0.0248309  0.737858  0.4606
x4: B     0.0104487 0.0235136  0.444368  0.6568
x4: C    -0.0170338 0.0236728 -0.719553  0.4718
x4: D    -0.0356445 0.0238845  -1.49237  0.1356
x4: E    -0.0303572  0.023757  -1.27782  0.2013

```

Notice that it took just over a minute (on a 4-core i5 processor desktop with 8 GB of memory).

Without fast=true it will take much longer but arrive at essentially the same answers.

It is not surprising that the estimated standard deviation of the random effects is 0. It represents the excess variability associated with the groups beyond that which is induced by the inherent variability of the random response. The fact that the coefficients for the levels of prod are significant is because there is no intercept in the model.

Dr Bates, do you use the [Rjulia](https://github.com/armgong/RJulia) package? [Or julia native](http://www.stat.wisc.edu/~bates/JuliaForRProgrammers.pdf)? — alexwhitworth, Jun 23 '17 at 19:17
Native Julia. The `RCall`, `RData` and `Feather` packages in Julia allow the user to import data from R to Julia. Trying to run Julia within R is much more difficult than connecting to R from Julia. For those who have tried to do such things the fact that `RCall` is written entirely in Julia, without any "glue" code in C/C++, is amazing. — Douglas Bates, Jun 23 '17 at 19:23
Okay--waiting to accept this answer until I can replicate on real data in Julia (eg learning a bare minimum of Julia) — alexwhitworth, Jun 23 '17 at 19:35

problems with model convergence: Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals,

1 Answers1