R: bootstrapped mixed model binary logistic regression

Question

I need to bootstrap my mixed model binary logistic regression. The model itself works fine (and is approved and corrected by an expert friend), but the bootstrapped version is buggy. The bootstrapped version was previously approved by another expert friend (in CrossValidated but later mods removed my post saying it does not belong on CrossValidated). But the same code happened to work for a simple fixed-effects multiple logistic regression (although in that case too there were lots of warnings similar to the warnings here [except this single warning which is for the lmer() function: "In mer_finalize(ans) : false convergence (8)").

Could you please let me know where the error resides and how to debug it?

Many thanks.

My code is (I temporarily kept the replicate numbers too low to debug the code):

library(boot)
library(lme4)

mixedGLM <- function(formula, data, indices) {
        d <- data[indices, ]
        (fit <- lmer(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                     + (1 | PatientID) + (0 + Trt | PatientID)
                     , family=binomial(logit), d))
        return(coef(fit))
      }

results <- boot(data=MixedModelData4 , statistic = mixedGLM, R= 2, formula= DV~Demo1 +Demo2 +Demo3 +Demo4 +Trt)

. . . My errors are:

Error in t.star[r, ] <- res[[r]] : 
  incorrect number of subscripts on matrix
In addition: Warning messages:
1: In mer_finalize(ans) : false convergence (8)
2: glm.fit: algorithm did not converge 
3: glm.fit: fitted probabilities numerically 0 or 1 occurred 
4: glm.fit: fitted probabilities numerically 0 or 1 occurred 
5: In mer_finalize(ans) : false convergence (8)

. . . Also could you please tell me how to make the boot() function give P values too??! It just gives beta and SE and bias and CI, but I need the P values too.

Many thanks.

--------------------------------------------------- Developing Story -----------------------------------------------------

Ok I gladly ran the nice code of Henrik. But the code did not quite finish running. First it gave this error:

Fitting 17 lmer() models:
[...
Error: pwrssUpdate did not converge in 30 iterations
In addition: Warning message:
In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (1 | PatientID) +  :
  Due to missing values, reduced number of observations to 90
> (results2 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2
+ results3 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2

Then I removed the first parentheses block and revised the syntax to this one:

results3 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                 + (0 + Trt | PatientID),
                 family=binomial(logit), data = MixedModelData4,
                 method = "PB", args.test = list(nsim = 2))

This time the test passed the first step (fitting the models) but failed at obtaining P values, again giving the same errors and warnings:

Fitting 17 lmer() models:
[.................]
Obtaining 16 p-values:
[....
Error: pwrssUpdate did not converge in 30 iterations
In addition: Warning messages:
1: In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (0 + Trt |  :
  Due to missing values, reduced number of observations to 90
2: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
3: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
4: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
5: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
6: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations

I have no idea how to debug it, or if the problem is my dataset? I should add that my dataset is fully mean-centered (all variables). The DV is only negated (since mean centering disallowed R to work and negating would do the same for a binary outcome).

---------------------------------------------------------- Update -------------------------------------------------------------

I changed the PB value of METHOD to LRT (as Henrik recommended) and the process of fitting the models finished but the process of obtaining the P values didn't start:

> results4 <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
+                   + (0 + Trt | PatientID),
+                   family=binomial(logit), data = MixedModelData4,
+                   method = "LRT", args.test = list(nsim = 2))
Fitting 17 lmer() models:
[.................]
Warning message:
In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (0 + Trt |  :
  Due to missing values, reduced number of observations to 90

It turned out the P values are not obtained by bootstrapping when LRT is being used. Therefore, the results were already ready (although non-bootstrapped).

Thanks a lot. Are you talking about this statement: "results <- boot()"? Then I should add a vector of indices... I couldn't clearly understand why mixedGLM argument was not already defined? Could you please detail? My final question is that from which part of the formula or the bootstrap code you inferred that I am destroying the correlations? If you want more details on my design, please see here: http://www.talkstats.com/showthread.php/48505-Binary-mixed-model-logistic-regression-using-lmer()-of-lme4-for-multilevel-analysis?p=137355&viewfull=1#post137355 — Vic, Aug 25 '13 at 12:07
Please note that I don't have a real repeated-measures, but a pseudo-replication. My patients are repeated in the long-format dataset. So I don't know if I even have a real correlation between repeated measures or a 100% pseudo-correlation? My Demo variables are at patient level, but the Trt (treatment) is at treatment level, repeated for each patient with the real medication and the placebo... for details, check that link please. — Vic, Aug 25 '13 at 12:10
Uhu I see thank you so much. I agree it surprisingly doesn't define the "formula" argument, but I have modified this function from a website and it actually works for fixed-effects binary logit. Perhaps in that case too it works incorrectly. But the site is legitimate. I have elaborated on that function and the website I am talking about here: http://www.talkstats.com/showthread.php/46779-Bootstrapped-binary-logistic-regression-in-R — Vic, Aug 25 '13 at 12:54
However I would try to run the model on the wide version (I have it too but I don't know how exactly can I distinguish treatment and placebo in the wide format, since in it, these two have different columns rather than being two levels of the same variable: treatment). — Vic, Aug 25 '13 at 12:56
About the study design, thanks for your attention. It is a little bit confidential so I cannot detail on that, but I can say we treated subjects with placebo and real medication simultaneously (topical medications can be applied locally ;) ). So technically it is possible to do so, since the disease was local and could be find at different points of a single person). — Vic, Aug 25 '13 at 13:01
A few comments: (1) it would be a really good idea to check out the development (soon-to-be-released) version of `lme4`, which has some built-in capabilities [`bootMer` and `confint(...,method="boot")`] and (2) the `refit()` function that speeds things up a lot; (3) it is very common to see failures on some bootstrap replicates. — Ben Bolker, Aug 25 '13 at 16:13
actually, I see from comments below that you are already working with fairly recent versions. — Ben Bolker, Aug 25 '13 at 16:15
Thanks a lot Ben, yes as Henrik stated I downloaded the developer versions not the CRAN ones (I had CRAN ones but deleted them beforehand). — Vic, Aug 25 '13 at 16:39
I see Ben is a package developer himself! Nice to be among genius people. — Vic, Aug 26 '13 at 09:40
Dear Henrik I now understand (from what Randel told me) that the reason my new model differs a lot with the previous one is that I have removed the random intercept!! (1 | PatientID) ------ although I had no other choice because the code didn't work otherwise. — Vic, Aug 26 '13 at 09:41

score 4 · Accepted Answer · answered Aug 25 '13 at 12:12

4

If you want p-values from a GLMM with a parametric bootstrap you can use function mixed from package afex which obtains them via pbkrtest::PBmodcomp:

library(afex)
results <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                     + (1 | PatientID) + (0 + Trt | PatientID),
                     family=binomial(logit), data = d,
                     method = "PB", args.test = list(nsim = 1000))

You could even first define a local cluster (i.e., use multiple cores):

cl <- makeCluster(rep("localhost", 4))
results <- mixed(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
                     + (1 | PatientID) + (0 + Trt | PatientID),
                     family=binomial(logit), data = d,
                     method = "PB", args.test = list(nsim = 1000, cl = cl))

It is probably the best to install the development versions of all three packages (as the current version of pbkrtest is designed for lme4 1.0 which is not yet on cran):

afex from r-forge: install.packages("afex", repos="http://R-Forge.R-project.org")
pbkrtest from: http://people.math.aau.dk/~sorenh/software/pbkrtest/devel/
lme4 from github: https://github.com/lme4/lme4

answered Aug 25 '13 at 12:12

Henrik

14,202
10
68
91

I just can say Awesome! :) I am installing R3 and trying the packages you kindly linked to. – Vic Aug 25 '13 at 12:49
the development lme4 was successfully installed from github, but now afex cannot find it, although both are successfully installed (all three are successfully installed with all pre-requirements)... still working on it! – Vic Aug 25 '13 at 13:57
Uh I see the problem is in the previous lme4 which is only partially deleted :Warning in install.packages : unable to move temporary installation ‘C:\Program Files\R\R-3.0.1\library\filead86d9a23d\lme4’ to ‘C:\Program Files\R\R-3.0.1\library\lme4’:-------------- fixed. – Vic Aug 25 '13 at 13:59
I tested your very nice code, but apparently my data is not very good. or there is some problem somewhere. this error is retured: [code] "Error: pwrssUpdate did not converge in 30 iterations In addition: Warning message: In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (1 | PatientID) + : Due to missing values, reduced number of observations to 90" [/code] – Vic Aug 25 '13 at 14:06
1

This problem is know and the author of `pbkrtest` is working on it: http://thread.gmane.org/gmane.comp.lang.r.lme4.devel/10509/focus=10518 But it means, your model/data is not really well behaved. Does it fail on a particular model? ALternatively try `method="LRT"` instead of `"PB"` and use PBmodcomp on only the interesting comparisons. – Henrik Aug 25 '13 at 15:00
Many thanks Henrik. :) My design is detailed on in the response to user20650 (here), and from this link: http://www.talkstats.com/showthread.php/48505-Binary-mixed-model-logistic-regression-using-lmer()-of-lme4-for-multilevel-analysis – Vic Aug 25 '13 at 15:16
again a similar error occurred "Fitting 17 lmer() models: [.................] Warning message: In mixed(DV ~ (Demo1 + Demo2 + Demo3 + Demo4 + Trt)^2 + (0 + Trt | : Due to missing values, reduced number of observations to 90"... I am embedding it in my post. I revised the code a little bit. I would appreciate if you could kindly checked my revision. I removed one of the parentheses blocks so that it can run. – Vic Aug 25 '13 at 15:23
@Vic A warning is no error, results 4 exists. It only tells you that there are missing values in your data which will be removed. To avoid this warning, remove the rows with missing values beforehands (there can't be any NAs in any of the variables in your model). The problem in your data is that you are trying to assess 17 effects with only 90 observations. You should think about reducing your model. – Henrik Aug 25 '13 at 15:45
WOW THANKS A LOTTTTTT :) I WILL DO IT RIGHT NOW :) – Vic Aug 25 '13 at 16:03
1

btw, `method = "LRT"` does not compute p-values based on parametric bootstrap but based on likelihood ratio tests. – Henrik Aug 25 '13 at 16:04
Oh I had forgotten to ask for seeing the vector results by typing it or by summary() !! Sorry sorry !! :) THANK YOUUUUUUUUU. I reduced the model and removed that single empty row and now no warnings thanks. Now I want to give a more comprehensive model (with a couple of important interactions) a try. – Vic Aug 25 '13 at 16:51
It worked even with all the interactions, and the results are LOVABLE :) THANK YOU THANK YOU THANK YOU! – Vic Aug 25 '13 at 16:55
Only one question: is my resampling correct in the first place, given the format of my dataset? My dataset consists of 2n observations (2n cases) within n patients. So I think you call it "long" format as user20650 stated. Is such a resampling valid? This is a question of mine since the P values improve surprisingly after bootstrapping; so I fear perhaps I am doing the bootstrapping wrong – Vic Aug 25 '13 at 17:26
Or maybe the jump in the P values in due to the LRT method, but anyways the jumps are so welcomed!! – Vic Aug 25 '13 at 17:30
1

@vic If you use `method = "LRT"` you are not bootstrapping. Only `method = "PB"` uses bootstrap. If you use thise with a sufficiently large sample (> 1000) this is your bootstrap. – Henrik Aug 25 '13 at 18:25
Oh. But the results of LRT were extremely different from and better than the lmer() results of the same model I have. So can you have a look at my main lmer() model and say is it correct? I would appreciate very much. http://www.talkstats.com/showthread.php/48505-Binary-mixed-model-logistic-regression-using-lmer()-of-lme4-for-multilevel-analysis?p=137370&viewfull=1#post137370 – Vic Aug 25 '13 at 18:44
Henrik, I reduced my regression to "DV ~ Trt + (1 | PatientID) + (0 + Trt | PatientID) but I am still receiving this error: "Error: pwrssUpdate did not converge in 30 iterations"... I tried any possible combination of variables. The "Only" solution is to remove the random effect intercept (1 | PatientID). Otherwise the code won't work at all. Do you have any ideas? – Vic Aug 26 '13 at 14:18
1

@Vic as I said, the author of `pbkrtest` knows about this problem and is working on it (but I don't know when thre will be a solution). So long, you can use the old version of `lme4` (< 1) and pbkrtest (0.3.4 from the CRAN Archives: http://cran.r-project.org/src/contrib/Archive/pbkrtest/). This should work better. – Henrik Aug 26 '13 at 14:41

R: bootstrapped mixed model binary logistic regression

1 Answers1

Linked