I am working on a model that includes several REs and a spline for one of the variables, so I am trying to use gam()
. However, I reach memory exhaust limit error (even when I run it on a cluster with 128GB). This happens even when I run the simplest of models with just one RE. The same models (minus the spline) run smoothly and in just a few seconds (or minutes for the full model) when I use lmer()
instead.
I was wondering if anyone had any idea why the discrepancy between gam()
and lmer()
and any potential solutions.
Here's some code with simulated data and the simplest of models:
library(mgcv)
library(lme4)
set.seed(1234)
person_n <- 38000 # number of people (grouping variable)
n_j <- 15 # number of data points per person
B1 <- 3 # beta for the main predictor
n <- person_n * n_j
person_id <- gl(person_n, k = n_j) #creating the grouping variable
person_RE <- rep(rnorm(person_n), each = n_j) # creating the random errors
x <- rnorm(n) # creating x as a normal dist centered at 0 and sd = 1
error <- rnorm(n)
#putting it all together
y <- B1 * x + person_RE + error
dat <- data.frame(y, person_id, x)
m1 <- lmer(y ~ x + (1 | person_id), data = dat)
g1 <- gam(y ~ x + s(person_id, bs = "re"), method = "REML", data = dat)
m1
runs in just a couple seconds on my computer, whereas g1
hits the error:
Error: vector memory exhausted (limit reached?)