0

I am trying to fit a generalized linear mixed-effects model to my data, using the lme4 package.

The data can be described as follows (see example below): Survival data of fish over 28 days. Explanatory variables in the example data set are:

  • Region This is the geographical region from which the larvae originated.
  • treatment The temperatures at which sub-samples of fish from each region were raised.
  • replicate One of three replications of the entire experiment
  • tub Random variable. 15 tubs (used to maintain experimental temperatures in aquaria) in total (3 replicates for each of 5 temperature treatments). Each tub contained 1 aquaria for each Region (4 aquaria in total) and was located randomly in the lab.
  • Day Self explanatory, number of days from the start of the experiment.
  • stage is not being used in the analysis. Can be ignored.

Response variable

  • csns cumulative survival. i.e remaining fish/initial fish at day 0.
  • start weights used to tell the model that the probability of survival is relative to this number of fish at start of experiment.
  • aquarium Second random variable. This is the unique ID for each individual aquaria containing the value of each factor that it belongs to. e.g. N-14-1 means Region N, Treatment 14, replicate 1.

My problem is unusual, in that I have fitted the following model before:

   dat.asr3<-glmer(csns~treatment+Day+Region+
      treatment*Region+Day*Region+Day*treatment*Region+
      (1|tub)+(1|aquarium),weights=start, 
      family=binomial, data=data2)  

However, now that I am attempting to re-run the model, to generate analyses for publication, I am getting the following errors with the same model structure and package. Output is listed below:

> Warning messages:  

1: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 1.59882 (tol = 0.001, component >1)
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?

My understanding is the following:

Warning message 1.

non-integer #success in a binomial glm refers to the proportion format of the csns variable. I have consulted several sources, here included, github, r-help, etc, and all suggested this. The research fellow that assisted me in this analysis 3 years ago, is unreachable. Can it have to do with changes in lme4 package over the last 3 years?

Warning message 2.

I understand this is a problem because there are insufficient data points to fit the model to, particularly at
L-30-1, L-30-2 and L-30-3,
where only two observations are made:
Day 0 csns=1.00 and Day 1 csns=0.00
for all three aquaria. Therefore there is no variability or sufficient data to fit the model to.

Nevertheless, this model in lme4 has worked before, but doesn't run without these warnings now.

Warning message 3

This one is entirely unfamiliar to me. Never seen it before.

Sample data:

Region treatment replicate tub Day stage csns start aquarium  
 N        14         1  13   0    1 1.00   107   N-14-1
 N        14         1  13   1    1 1.00   107   N-14-1
 N        14         1  13   2    1 0.99   107   N-14-1
 N        14         1  13   3    1 0.99   107   N-14-1
 N        14         1  13   4    1 0.99   107   N-14-1
 N        14         1  13   5    1 0.99   107   N-14-1

The data in question 1005cs.csv is available here via we transfer: http://we.tl/ObRKH0owZb

Any help with deciphering this problem, would be greatly appreciated. Also any alternative suggestions for suitable packages or methods to analyse this data would be great too.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Daniel Svozil
  • 85
  • 1
  • 1
  • 9
  • 1
    @RichieCotton: Proportions can be modelled with the binomial distribution, if the number of trials is know. Then the `weights`argument must be given. See also http://stats.stackexchange.com/questions/87956/r-lme4-how-to-apply-binomial-glm-to-percentages-rather-than-yes-no-counts I see no problem with this... – EDi Dec 15 '15 at 09:02
  • I think we need a reproducible example here... Gives the model sensible results? – EDi Dec 15 '15 at 09:02
  • I have used `weights` argument according to documentation for `lme4`. This is precisely why it is so perplexing. There should be no issue, but there is. What is the best way to send the data in question? It is a large .csv file with ~1250 rows...the model results are rather nonsensical, mind you, according to me anyway. – Daniel Svozil Dec 15 '15 at 09:06
  • The usual method is to post a link to a public "dropbox" of some sort. Google-docs is one. Dropbox.com is another. although in recent years it has adopted an annoying registration process that puts items out-of-reach for those of us who don't trust them anymore. – IRTFM Dec 15 '15 at 19:11
  • The data in question 1005cs.csv is available here via we transfer: http://we.tl/ObRKH0owZb @EDi – Daniel Svozil Dec 16 '15 at 00:16

1 Answers1

3

tl;dr the "non-integer successes" warning is accurate; it's up to you to decide whether fitting a binomial model to these data really makes sense. The other warnings suggest that the fit is a bit unstable, but scaling and centering some of the input variables can make the warnings go away. It's up to you, again, to decide whether the results from these different formulations are different enough for you to worry about ...

data2 <- read.csv("1005cs.csv")
library(lme4)

Fit model (slightly more compact model formulation)

dat.asr3<-glmer(
   csns~Day*Region*treatment+
        (1|tub)+(1|aquarium),
            weights=start, family=binomial, data=data2)

I do get the warnings you report.

Let's take a look at the data:

library(ggplot2); theme_set(theme_bw())
ggplot(data2,aes(Day,csns,colour=factor(treatment)))+
        geom_point(aes(size=start),alpha=0.5)+facet_wrap(~Region)

enter image description here

Nothing obviously problematic here, although it does clearly show that the data are very close to 1 for some treatment combinations, and that the treatment values are far from zero. Let's try scaling & centering some of the input variables:

data2sc <- transform(data2,
                     Day=scale(Day),
                     treatment=scale(treatment))
dat.asr3sc <- update(dat.asr3,data=data2sc)

Now the "very large eigenvalue" warning is gone, but we still have the "non-integer # successes" warning, and a max|grad|=0.082. Let's try another optimizer:

dat.asr3scbobyqa <- update(dat.asr3sc,
                     control=glmerControl(optimizer="bobyqa"))

Now only the "non-integer #successes" warning remains.

d1 <- deviance(dat.asr3)
d2 <- deviance(dat.asr3sc)
d3 <- deviance(dat.asr3scbobyqa)
c(d1,d2,d3)
## [1] 12597.12 12597.31 12597.56

These deviances don't differ by very much (0.44 on the deviance scale is more than could be accounted for by round-off error, but not much difference in goodness of fit); actually, the first model gives the best (lowest) deviance, suggesting that the warnings are false positives ...

resp <- with(data2,csns*start)
plot(table(resp-floor(resp)))

enter image description here

This makes it clear that there really are non-integer responses, so the warning is correct.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Right. @Ben Bolker, I still have a couple of questions, although cleared the errors up. 1) Is there a simple explanation for why these errors are arising now and did not arise before (about 3 years ago)? As I said in my original post, these errors didn't show when I was using R (v2.8 or thereabouts), with the exact same model, package and code. 2) Is the the `non-integer #successes` error something to be concerned about? If yes, how can I potentially overcome it? If I decide it doesn't make sense to use proportions, how can I express the data differently, that won't generate the error? – Daniel Svozil Dec 22 '15 at 03:39
  • (1) we've added a lot of diagnostic warnings to lme4 in the last three years -- you're almost definitely using a newer version of lme4. (2) Yes. If you have the original sample sizes from which these proportions were drawn, add them as a `weights` argument. Otherwise, you may need to search/ask another question here or on [Cross Validated](http://stats.stackexchange.com) – Ben Bolker Dec 22 '15 at 16:38