0

Somehow as a follow up on the question Creating confidence intervals for regression curve in GLMM using Bootstrapping, I am interested in getting the correct values of a regression curve and the associated confidence interval curves.

Consider a case where in a GLMM, there is one response variable, two continuous fixed effects and one random effect. Here is some fake data:

library (dplyr)

set.seed (1129)
x1 <- runif(100,0,1)
x2 <- rnorm(100,0.5,0.4)
f1 <- gl(n = 5,k = 20)
rnd1<-rnorm(5,0.5,0.1)

my_data <- data.frame(x1=x1, x2=x2, f1=f1)
modmat <- model.matrix(~x1+x2, my_data)
fixed <- c(-0.12,0.35,0.09)

y <- (modmat%*%fixed+rnd1)

my_data$y <- ((y - min (y))/max(y- min (y))) %>% round (digits = 1)
rm (y)

The GLMM that I fit looks like this:

m1<-glmer (y ~x1+x2+(1|f1), my_data, family="binomial")
summary (m1)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: y ~ x1 + x2 + (1 | f1)
   Data: my_data

     AIC      BIC   logLik deviance df.resid 
    65.7     76.1    -28.8     57.7       96 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-8.4750 -0.7042 -0.0102  1.5904 14.5919 

Random effects:
 Groups Name        Variance  Std.Dev. 
 f1     (Intercept) 1.996e-10 1.413e-05
Number of obs: 100, groups:  f1, 5

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -9.668      2.051  -4.713 2.44e-06 ***
x1            12.855      2.659   4.835 1.33e-06 ***
x2             4.875      1.278   3.816 0.000136 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
   (Intr) x1    
x1 -0.970       
x2 -0.836  0.734
convergence code: 0
boundary (singular) fit: see ?isSingular

Plotting y vs x1:

plot (y~x1, my_data)

enter image description here

It should be possible to get a regression curve from the summary of m1. I have learned that I need to reverse the link-function (in this case, "logit"):

y = 1/(1+exp(-(Intercept+b*x1+c*x2)))

In order to plot a regression curve of x1 in a two-dimensional space, I set x2 = mean(x2) in the formula (which also seems important - the red line in the following plots ignores x2, apparently leading to considerable bias). The regression line:

xx <- seq (from = 0, to = 1, length.out = 100)
yy <- 1/(1+exp(-(-9.668+12.855*xx+4.875*mean(x2))))
yyy <- 1/(1+exp(-(-9.668+12.855*xx)))

lines (yy ~ xx, col = "blue") 
lines (yyy~ xx, col = "red")

enter image description here

I think, the blue line looks not so good (and the red line worse, of course). So as a side-question: is y = 1/(1+exp(-(Intercept+b*x1+c*x2))) always the right choice as a back-transformation of the logit-link? I am asking because I found this https://sebastiansauer.github.io/convert_logit2prob/, which made me suspicious. Or is there another reason for the model not to fit so well? Maybe my data creation process is somewhat 'bad'.

What I need now is to add the 95%-confidence interval to the curve. I think that Bootstrapping using the bootMer function should be a good approach. However, all examples that I found were on models with one single fixed effect. @Jamie Murphy asked a similar question, but he was interested in models containing a continuous and a categorical variable as fixed effects here: Creating confidence intervals for regression curve in GLMM using Bootstrapping

But when it comes to models with more than one continuous variables as fixed effects, I get lost. Perhaps someone can help solve this issue - possibly with a modification of the second part of this tutorial: https://www.r-bloggers.com/2015/06/confidence-intervals-for-prediction-in-glmms/

yenats
  • 531
  • 1
  • 3
  • 16
  • In your `lines` commands you have switched x and y. They need to be the other way around. – Roland Oct 02 '20 at 10:31
  • `library(ggeffects); pr <- ggpredict(m1, "x1", type = "fixed"); plot(pr, add.data = TRUE) ` Might be of interest. Unfortunately, the confidence interval doesn't seem to be a rough approximation and I'm not sure it takes the mixed-model structure well into account. – Roland Oct 02 '20 at 10:42
  • Replace "doesn't" with "does". – Roland Oct 02 '20 at 10:57
  • About the syntax in the lines argument - are you sure? The regression lines fit much worse then... – yenats Oct 02 '20 at 12:59
  • Yes, I'm sure... – Roland Oct 02 '20 at 21:07
  • I edited my question. I am quite unsure now why the model performs that bad. Perhaps the data I generated do not really fit the case I want to illustrate? – yenats Oct 06 '20 at 14:10
  • `ggpredict` might yield a good solution, thank you @Roland. I only get error messages when applying the function on an averaged model (which is my final goal): `Error in predict.averaging(model, newdata = data, type = type, se.fit = TRUE, : predict' for models '16', '14' and '8' caused errors`. I do not really now what to do about it, because the error is so generalistic. – yenats Oct 07 '20 at 10:28
  • I don't think model averaging is supported. I'm not sure marginal effects for ensemble models is even a thing. – Roland Oct 07 '20 at 10:33
  • So you mean: it might be the case that there is no way to get confidence intervals for ensemble models (such as averaged models) due to the fact that it simply doesn't make sense? – yenats Oct 07 '20 at 10:37
  • My issue is that I have no idea what a marginal effect of an ensemble model actually *is*. Without knowing that, discussion of CIs is moot. – Roland Oct 07 '20 at 10:43
  • 1
    I suggest you ask for advice at stats.stackexchange.com. At its core this is not a programming issue. – Roland Oct 07 '20 at 10:44

0 Answers0