11

I want to run a Gaussian GLM with a log link and an offset. The following problems arise:

y <- c(1,1,0,0)
t <- c(5,3,2,4)

No problem:

exp(coef(glm(y~1 +  offset(log(t)), family=poisson)))

with family=gaussian, starting values need to be specified, it works here:

exp(coef(glm(y~1, family=gaussian(link=log), start=0)))

but does not work here:

exp(coef(glm(y~1 +  offset(log(t)), family=gaussian(link=log), start=0)))

Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some"

Does anyone see what's wrong (hopefully just in my coding) ?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Andi
  • 111
  • 1
  • 1
  • 4
  • 1
    "does not work" is less useful than the actual error message, which is "Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some" I played w/ some simple `glm(y~1+offset(junk))` and all worked fine. I think you have a very small data set and some rather unlikely offsets, so glm simply can't find a fit. – Carl Witthoft Nov 21 '11 at 14:08
  • My "very small data set" is what people call minimal example let you – Andi Nov 21 '11 at 14:28
  • Is it not because you are trying to take to log of 0? – James Nov 21 '11 at 15:06
  • No, starting value should be on linear predictor scale, that means it starts with beta_zero=0. Other starting values same error. Wonder why R asks for starting values when they are actually specified... – Andi Nov 21 '11 at 15:29
  • 1
    @Andi I was meaning taking the log of 0 in the `y` values. Try it with non-zero positive `y` and it will work, and with no `start` required. – James Nov 21 '11 at 15:37

2 Answers2

14

Here are the results of some archaeology that explains what's going on, deep within the glm function:

Debugging (with debug("glm")) and stepping through the function shows that it fails at the following call:

if (length(offset) && attr(mt, "intercept") > 0L) {
  fit$null.deviance <- eval(call(if (is.function(method)) "method" else method, 
    x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, 
    offset = offset, family = family, control = control, 
    intercept = TRUE))$deviance
}

This is an attempt to calculate the null deviance for the model. It's only evaluated if there's an intercept term and an offset term (I'm not sure why; it may be that the default null deviance calculated by the previous call to glm is wrong in that case and must be recalculated?). It calls glm.fit (the default value of method), but without starting values because these are usually unnecessary for the intercept-only model.

Now debugging inside glm.fit to see what happens: we get (within a call to the family function, gaussian()) to:

  if (is.null(etastart) && is.null(start) && is.null(mustart) && 
    ((family$link == "inverse" && any(y == 0)) || (family$link == 
        "log" && any(y <= 0))))
    stop("cannot find valid starting values: please specify some")

and we see that because the starting values were not passed through, because a log link is used, and because some y values are equal to zero, the fit fails. So this is a case that should happen if (and only if?) an offset and an intercept are both specified, a log link is used, and there are zero values in the response.

If you dump("glm",file="glmtemp.R"); add the line

    start = start[1], etastart = etastart[1], mustart = mustart[1],

to the call that fits the null deviance (i.e. the one shown above); and source("glmtemp.R"), it seems to work OK ... I think this should be a reasonable general solution. If anyone wants to bring this issue up on the R development list, feel free.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • With your addition to the call the technical problem is solved. Thank you! Can you also give me a hint why the estimated parameter does differ from the poisson-glm ? (same offset, same log-link) Without an offset term both estimates are identical. – Andi Nov 22 '11 at 10:29
  • I don't have time to dig through this, but why would you expect them to be the same? I think it's a special case that the non-offset models are identical ... I would just work through the definition of the model step by step (i.e. find the minimum expected variance-weighted sum of squares on the link scale, with or without the offset) – Ben Bolker Nov 22 '11 at 12:50
  • 2
    Nice digging! :) I think there is a better fix though: just add `mustart = fit$fitted.values`. I've submitted a patch at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16877 . – bastistician Aug 06 '20 at 10:54
10

I looks like start isn't being recognised when offset is present. You are trying to take the log of 0 in the y values which is -Inf. glm obviously cannot deal with this when looking for a solution without being given some help by start. A small perturbation in your y values will permit a solution.

exp(coef(glm(I(y+.Machine$double.eps)~1 + offset(log(t)), family=gaussian(link=log))))
(Intercept) 
  0.1481481
James
  • 65,548
  • 14
  • 155
  • 193