How to fix intercept value of glm

Question

This is the example that I work on it :

data2 = data.frame( X = c(0,2,4,6,8,10),
                    Y = c(300,220,210,90,80,10))
attach(data2)
model <- glm(log(Y)~X)
model

Call:  glm(formula = log(Y) ~ X)

Coefficients:
(Intercept)            X  
     6.0968      -0.2984  

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:      7.742 
Residual Deviance: 1.509    AIC: 14.74

My question is :

There is an option on glm function that allows me to fix intercept Coefficients with value that I want ? and to predict the x value ?

For example : I want that my Curve start With the upper "Y" value ==> I want change the intercept with log(300)

`model <- lm(log(Y)~X-1+offset(rep(log(300),nrow(data2))),data2)` ; `?predict` — Ben Bolker, Dec 29 '14 at 13:01
I can suppose here that I have : Y = AX+B , B = log(300), A = model$Coefficients ? ,I have a biology background, all this is new for me #Thanks — Cherif, Dec 29 '14 at 15:50

jlhoward · Accepted Answer · 2014-12-29T20:18:26.690

You are using glm(...) incorrectly, which IMO is a much bigger problem than offsets.

The main underlying assumption in least squares regression is that the error in the response is normally distributed with constant variance. If the error in Y is normally distributed, then log(Y) most certainly is not. So, while you can "run the numbers" on a fit of log(Y)~X, the results will not be meaningful. The theory of generalized linear modelling was developed to deal with this problem. So using glm, rather than fit log(Y) ~X you should fit Y~X with family=poisson. The former fits

log(Y) = b₀ + b₁x

while the latter fits

Y = exp(b₀ + b₁x)

In the latter case, if the error in Y is normally distributed, and if the model is valid, then the residuals will be normally distributed, as required. Note that these two approaches give very different results for b₀ and b₁.

fit.incorrect <- glm(log(Y)~X,data=data2)
fit.correct   <- glm(Y~X,data=data2,family=poisson)
coef(summary(fit.incorrect))
#               Estimate Std. Error  t value     Pr(>|t|)
# (Intercept)  6.0968294 0.44450740 13.71592 0.0001636875
# X           -0.2984013 0.07340798 -4.06497 0.0152860490
coef(summary(fit.correct))
#               Estimate Std. Error   z value     Pr(>|z|)
# (Intercept)  5.8170223 0.04577816 127.06982 0.000000e+00
# X           -0.2063744 0.01122240 -18.38951 1.594013e-75

In particular, the coefficient of X is almost 30% smaller when using the correct approach.

Notice how the models differ:

plot(Y~X,data2)
curve(exp(coef(fit.incorrect)[1]+x*coef(fit.incorrect)[2]),
      add=T,col="red")
curve(predict(fit.correct,  type="response",newdata=data.frame(X=x)),
      add=T,col="blue")

The result of the correct fit (blue curve) passes through the data more or less randomly, while the result of the incorrect fit grossly overestimates the data for small X and underestimates the data for larger X. I wonder if this is why you want to "fix" the intercept. Looking at the other answer, you can see that when you do fix Y₀ = 300, the fit underestimates throughout.

In contrast, let's see what happens when we fix Y₀ using glm properly.

data2$b0 <- log(300)   # add the offset as a separate column
# b0 not fixed
fit <- glm(Y~X,data2,family=poisson)
plot(Y~X,data2)
curve(predict(fit,type="response",newdata=data.frame(X=x)),
      add=TRUE,col="blue")
# b0 fixed so that Y0 = 300
fit.fixed <-glm(Y~X-1+offset(b0), data2,family=poisson)
curve(predict(fit.fixed,type="response",newdata=data.frame(X=x,b0=log(300))),
      add=TRUE,col="green")

Here, the blue curve is the unconstrained fit (done properly), and the green curve is the fit constraining Y₀ = 300. You cna see that they do not differ very much, because the correct (unconstrained) fit is already quite good.

I would argue that log-transforming the response, i.e. assuming log-Normally distributed residuals, is *not* necessarily an error. — Ben Bolker, Dec 29 '14 at 20:58
Yes, absolutely. If you know that the error is log-normally distributed, then the correct way is to use `lm(log(Y)~X)`, and using `glm(...)` as I have would be the incorrect approach. You would need to look at the Q-Q plots to see which is the correct way. But look at OP's data - the fit using poisson glm is much better. — jlhoward, Dec 29 '14 at 21:23
I don't actually think so. Comparing `fit.correct <- glm(Y~X,data=data2, family=quasipoisson);coefplot2::coefplot2(list(fit.correct,fit.incorrect),intercept=TRUE)` (sorry, `coefplot2` isn't on CRAN any more) shows that the coefficients aren't really that different; "30% smaller", but -0.3 (SE 0.07) vs. -0.2 (SE 0.04) isn't actually a significant difference ... (note I'm using `quasipoisson` here) — Ben Bolker, Dec 29 '14 at 22:23
This is a bit dense for a comment thread, but I think the coefficients of `X` are NSD because the estimate from the fit to `log(Y)~X` is relatively poor (p=0.015), while the estimate from the fit to `Y~X` using poisson glm is much better (p<2e-16). So, yes the CL around b1 for the "incorrect" fit does include the other b1. When I said the poisson glm fit was better I was reacting to the SSE. Defining this as sum((Y.obs-Y.fitted)^2), the poisson glm gives SSE=6611 whereas the `log(Y)~X` fit gives SSE=29081. — jlhoward, Dec 30 '14 at 00:28
indeed, but for what it's worth the `p<2e-16` is due entirely to overdispersion -- this really needs a model that allows overdispersion. — Ben Bolker, Dec 30 '14 at 00:48
I think fit.fixed most closely corresponding to my experience. all I need is a function y = ax + b with a known value of b for drawing my curve and sort the values of x for y values that I want. Just one more question please: example : for y = 50, x = (log (50) - log (300)) /fit.fixed$coefficients right? #thanks a lot — Cherif, Dec 30 '14 at 12:53

Ben Bolker · Answer 2 · 2014-12-30T00:54:23.750

data2 <- data.frame( X = c(0,2,4,6,8,10),
                Y = c(300,220,210,90,80,10)
m1 <- lm(log(Y)~X-1+offset(rep(log(300),nrow(data2))),data2)

There is a predict() function, but here it's probably easier to just predict by hand.

par(las=1,bty="l")
plot(Y~X,data=data2)
curve(300*exp(coef(m1)*x),add=TRUE)

enter image description here

For what it's worth, if you want compare log-Normal and Poisson models, you can do it via

library("ggplot2")
 theme_set(theme_bw())
 ggplot(data2,aes(X,Y))+geom_point()+
    geom_smooth(method="glm",family=quasipoisson)+
    geom_smooth(method="glm",family=quasi(link="log",var="mu^2"),
      colour="red",fill="red")

enter image description here

How to fix intercept value of glm

2 Answers2