4

I am doing statistical analysis for a dataset using GLM in R. Basically the predictor variables are: "Probe"(types of probes used in the experiment - Factor with 4 levels), "Extraction"(types of extraction used in the experiment - Factor with 2 levels), "Tank"(the tank number that the sample is collected from - integers from 1 to 9), and "Dilution"(the dilution of each sample - numbers: 3.125, 6.25, 12.5, 25, 50, 100). The response is the number of positive responses ("Positive") obtained from a number of repetition of the experiment ("Rep"). I want to assess the effects of all predictor variables (and their interactions) on the number of positive responses, so I tried to fit a GLM model like this:

y<-cbind(mydata$Positive,mydata$Rep - mydata$Positive)
model1<-glm(y~Probe*Extraction*Dilution*Tank, family=quasibinomial, data=mydata)

But I was later advised by my supervisor that the "Tank" predictor variable should not be treated as a level-based variable. i.e. it has values of 1 to 9, but it's just the tank label so the difference between 1 and, say, 7 is not important. Treating this variable as factor would only make a large model with bad results. So how to treat the "Tank" variable as a random factor and include it in the GLM?

Thanks

Tung Linh
  • 321
  • 1
  • 2
  • 16

1 Answers1

8

It is called a "mixed effect model". Check out the lme4 package.

library(lme4)
glmer(y~Probe + Extraction + Dilution + (1|Tank), family=binomial, data=mydata)

Also, you should probably use + instead of * to add factors. * includes all interactions and levels of each factor, which would lead to a huge overfitting model. Unless you have a specific reason to believe that there is interaction, in which case you should code that interaction explicitly.

thc
  • 9,527
  • 1
  • 24
  • 39
  • thank you I will check that out. Yeah I used the * because I wanted to test all the interactions between all parameters and see which one is significant. After some deletion tests there are only 3 interactions terms left that are significant. But I wasn't sure which one is at the start, so I had to use * – Tung Linh Apr 11 '17 at 07:37
  • 1
    Hi, I tried your solution, however, "quasi" cannot be used in `glmer`. I got this error after running the code `Error in lme4::glFormula(formula = y ~ Probe + Extraction + Dilution + : "quasi" families cannot be used in glmer` – Tung Linh Apr 11 '17 at 14:33
  • 1
    I needed quasi due to overdispersion present in my data. Is there a way to overcome/deal with this problem in the lme4 package? – Tung Linh Apr 11 '17 at 14:34
  • I found this thread: http://r.789695.n4.nabble.com/Question-on-overdispersion-tt3049898.html#a3050112. Basically, the author suggests that quasi-models are no longer required, and adding the random effect is statistically equivalent to adding overdispersion. – thc Apr 11 '17 at 19:37
  • @thc are you able to update your answer to the suggested approach? – baxx Mar 03 '19 at 18:52
  • I changed the family to binomial if that's what you mean. – thc Mar 03 '19 at 20:07