4

I am running a binomial logistic regression with a logit link function in R. My response is factorial [0/1] and I have two multilevel factorial predictors - let's call them a and b where a has 4 factor levels (a1,a2,a3,a4) and b has 9 factor levels(b1,b2...b9). Therefore:

mod <- glm(y~a+b, family=binomial(logit),data=pretend) summary(mod)

The model output would then show all the information about the model as well as the coefficients.

There is a factor level for both a and b missing (a1 and b1) from the summary output. I understand that it is fixed in the "intercept" of the model. I have read that if I want to remove the intercept term and see the estimates for those factor levels I can just add -1 or +0 to the model formula, i.e.:

mod2 <- glm(y~a+b-1, family=binomial(logit),data=pretend) 

...OR... mod2 <- glm(y~a+b+0, family=binomial(logit),data=pretend) summary(mod2)

In the new model (mod2) the intercept term is then gone and variable a's factor-level a1 is given amongst the list of coefficients. But, variable b's factor-level b1 is still missing and given that there is no intercept term anymore, how can I interpret the odds-ratio for that factor level then?

Could someone please explain to me how to get the coefficient for b1 too and why this is happening?

Thank you.

MiMi
  • 548
  • 1
  • 5
  • 8
  • Disappeared term's coef is zero. In mod, coefs of "a" are "a_" - " a1" and coefs of "b" are "b_ " - "b1", so a1 and b1's coef are zero (because of glm using "contr.treatment"). In mod2, b1's coef is truely zero because you give no intercept. I think this isn't a programming topic. – cuttlefish44 Jun 03 '16 at 13:16

3 Answers3

1

Why do you want to remove the intercept term and get the coefficient for a1?

A logistic regression model with a factor variable is fitted with the first factor level as the reference. The log odds (coefficient) for this factor level are set to 1.0 then.

When comparing log odds between factors (or groups), all log odds for the resulting factor levels refer to the base one. Hence you can calculate odds ratios between different groups and predict if the event is more or less likely to occur (compared to the base factor level).

I do not know what serves as the reference for any level of a if there is no reference level in a anymore. If the reference for a is b1 then, how do you interpret this? Is there any reference that the removal of the intercept makes sense? (really curious, have not heard about this approach yet)

Btw, you do not need the intercept to calculate odds ratios among factor levels. Here is a small example calculating odds ratios of a random binomial glm:

library(oddsratio)
fit.glm <- glm(admit ~ gre + gpa + rank, data = data.glm, family = "binomial") # fit model

# Calculate OR for specific increment step of continuous variable
calc.oddsratio.glm(data = data.glm, model = fit.glm, incr = list(gre = 380, gpa = 5))

predictor oddsratio CI.low (2.5 %) CI.high (97.5 %)          increment
1     gre     2.364          1.054            5.396                380
2     gpa    55.712          2.229         1511.282                  5
3   rank2     0.509          0.272            0.945 Indicator variable
4   rank3     0.262          0.132            0.512 Indicator variable
5   rank4     0.212          0.091            0.471 Indicator variable
pat-s
  • 5,992
  • 1
  • 32
  • 60
0

It's interesting that a1 is given. One would expect one factor level to serve as the 'reference' and therefore not to have any OR in the output (because it is 1.0).

I think b1 is your reference, therefore hidden, and therefore 1.0.

Jasper
  • 555
  • 2
  • 12
  • I have just rerun the model and swapped a and b around so that: 'mod2 <- glm(y~b+a-1, family=binomial(logit),data=pretend) ' And then b1 appears in the output but a1 is missing. But the strange/interesting part is, the coefficient and standard error for a1 in the first instance, and b1 in this second instance is EXACTLY the same. Huh??? Why? I'm a bit stumped at the moment. – MiMi Jun 03 '16 at 10:00
0

You can try adjusting the contrasts. My favorites are

options(contrasts = c('contr.sum','contr.poly'))

Here the assumption is that the sum of the a_i's = 0 and the sum of the b_i's = 0 (though it just occurred to me that this may not be the case for GLM) With those contrasts, it usually leaves off the last a and b because they can be recovered by taking the opposite of the sum of the other a's or b's respectively (since they all sum to 0.)

check this question out or more reference. https://stats.stackexchange.com/questions/162381/how-to-fit-a-glm-with-sum-to-zero-constraints-in-r-no-reference-level

Community
  • 1
  • 1
Bryan Goggin
  • 2,449
  • 15
  • 17