3

This is the example that I work on it :

data2 = data.frame( X = c(0,2,4,6,8,10),
                    Y = c(300,220,210,90,80,10))
attach(data2)
model <- glm(log(Y)~X)
model

Call:  glm(formula = log(Y) ~ X)

Coefficients:
(Intercept)            X  
     6.0968      -0.2984  

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:      7.742 
Residual Deviance: 1.509    AIC: 14.74

My question is :

There is an option on glm function that allows me to fix intercept Coefficients with value that I want ? and to predict the x value ?

For example : I want that my Curve start With the upper "Y" value ==> I want change the intercept with log(300)

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Cherif
  • 33
  • 1
  • 6
  • 2
    `model <- lm(log(Y)~X-1+offset(rep(log(300),nrow(data2))),data2)` ; `?predict` – Ben Bolker Dec 29 '14 at 13:01
  • I can suppose here that I have : Y = AX+B , B = log(300), A = model$Coefficients ? ,I have a biology background, all this is new for me #Thanks – Cherif Dec 29 '14 at 15:50

2 Answers2

6

You are using glm(...) incorrectly, which IMO is a much bigger problem than offsets.

The main underlying assumption in least squares regression is that the error in the response is normally distributed with constant variance. If the error in Y is normally distributed, then log(Y) most certainly is not. So, while you can "run the numbers" on a fit of log(Y)~X, the results will not be meaningful. The theory of generalized linear modelling was developed to deal with this problem. So using glm, rather than fit log(Y) ~X you should fit Y~X with family=poisson. The former fits

log(Y) = b0 + b1x

while the latter fits

Y = exp(b0 + b1x)

In the latter case, if the error in Y is normally distributed, and if the model is valid, then the residuals will be normally distributed, as required. Note that these two approaches give very different results for b0 and b1.

fit.incorrect <- glm(log(Y)~X,data=data2)
fit.correct   <- glm(Y~X,data=data2,family=poisson)
coef(summary(fit.incorrect))
#               Estimate Std. Error  t value     Pr(>|t|)
# (Intercept)  6.0968294 0.44450740 13.71592 0.0001636875
# X           -0.2984013 0.07340798 -4.06497 0.0152860490
coef(summary(fit.correct))
#               Estimate Std. Error   z value     Pr(>|z|)
# (Intercept)  5.8170223 0.04577816 127.06982 0.000000e+00
# X           -0.2063744 0.01122240 -18.38951 1.594013e-75

In particular, the coefficient of X is almost 30% smaller when using the correct approach.

Notice how the models differ:

plot(Y~X,data2)
curve(exp(coef(fit.incorrect)[1]+x*coef(fit.incorrect)[2]),
      add=T,col="red")
curve(predict(fit.correct,  type="response",newdata=data.frame(X=x)),
      add=T,col="blue")

The result of the correct fit (blue curve) passes through the data more or less randomly, while the result of the incorrect fit grossly overestimates the data for small X and underestimates the data for larger X. I wonder if this is why you want to "fix" the intercept. Looking at the other answer, you can see that when you do fix Y0 = 300, the fit underestimates throughout.

In contrast, let's see what happens when we fix Y0 using glm properly.

data2$b0 <- log(300)   # add the offset as a separate column
# b0 not fixed
fit <- glm(Y~X,data2,family=poisson)
plot(Y~X,data2)
curve(predict(fit,type="response",newdata=data.frame(X=x)),
      add=TRUE,col="blue")
# b0 fixed so that Y0 = 300
fit.fixed <-glm(Y~X-1+offset(b0), data2,family=poisson)
curve(predict(fit.fixed,type="response",newdata=data.frame(X=x,b0=log(300))),
      add=TRUE,col="green")

Here, the blue curve is the unconstrained fit (done properly), and the green curve is the fit constraining Y0 = 300. You cna see that they do not differ very much, because the correct (unconstrained) fit is already quite good.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • 2
    I would argue that log-transforming the response, i.e. assuming log-Normally distributed residuals, is *not* necessarily an error. – Ben Bolker Dec 29 '14 at 20:58
  • Yes, absolutely. If you know that the error is log-normally distributed, then the correct way is to use `lm(log(Y)~X)`, and using `glm(...)` as I have would be the incorrect approach. You would need to look at the Q-Q plots to see which is the correct way. But look at OP's data - the fit using poisson glm is much better. – jlhoward Dec 29 '14 at 21:23
  • I don't actually think so. Comparing `fit.correct <- glm(Y~X,data=data2, family=quasipoisson);coefplot2::coefplot2(list(fit.correct,fit.incorrect),intercept=TRUE)` (sorry, `coefplot2` isn't on CRAN any more) shows that the coefficients aren't really that different; "30% smaller", but -0.3 (SE 0.07) vs. -0.2 (SE 0.04) isn't actually a significant difference ... (note I'm using `quasipoisson` here) – Ben Bolker Dec 29 '14 at 22:23
  • This is a bit dense for a comment thread, but I think the coefficients of `X` are NSD because the estimate from the fit to `log(Y)~X` is relatively poor (p=0.015), while the estimate from the fit to `Y~X` using poisson glm is much better (p<2e-16). So, yes the CL around b1 for the "incorrect" fit does include the other b1. When I said the poisson glm fit was better I was reacting to the SSE. Defining this as sum((Y.obs-Y.fitted)^2), the poisson glm gives SSE=6611 whereas the `log(Y)~X` fit gives SSE=29081. – jlhoward Dec 30 '14 at 00:28
  • indeed, but for what it's worth the `p<2e-16` is due entirely to overdispersion -- this really needs a model that allows overdispersion. – Ben Bolker Dec 30 '14 at 00:48
  • I think fit.fixed most closely corresponding to my experience. all I need is a function y = ax + b with a known value of b for drawing my curve and sort the values of x for y values that I want. Just one more question please: example : for y = 50, x = (log (50) - log (300)) /fit.fixed$coefficients right? #thanks a lot – Cherif Dec 30 '14 at 12:53
4
data2 <- data.frame( X = c(0,2,4,6,8,10),
                Y = c(300,220,210,90,80,10)
m1 <- lm(log(Y)~X-1+offset(rep(log(300),nrow(data2))),data2)

There is a predict() function, but here it's probably easier to just predict by hand.

par(las=1,bty="l")
plot(Y~X,data=data2)
curve(300*exp(coef(m1)*x),add=TRUE)

enter image description here

For what it's worth, if you want compare log-Normal and Poisson models, you can do it via

library("ggplot2")
 theme_set(theme_bw())
 ggplot(data2,aes(X,Y))+geom_point()+
    geom_smooth(method="glm",family=quasipoisson)+
    geom_smooth(method="glm",family=quasi(link="log",var="mu^2"),
      colour="red",fill="red")

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453