Here are the results of some archaeology that explains what's going on, deep within the glm
function:
Debugging (with debug("glm")
) and stepping through the function shows that it fails at the following call:
if (length(offset) && attr(mt, "intercept") > 0L) {
fit$null.deviance <- eval(call(if (is.function(method)) "method" else method,
x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights,
offset = offset, family = family, control = control,
intercept = TRUE))$deviance
}
This is an attempt to calculate the null deviance for the model. It's only evaluated if there's an intercept term and an offset term (I'm not sure why; it may be that the default null deviance calculated by the previous call to glm
is wrong in that case and must be recalculated?). It calls glm.fit
(the default value of method
), but without starting values because these are usually unnecessary for the intercept-only model.
Now debugging inside glm.fit
to see what happens: we get (within a call to the family function, gaussian()
) to:
if (is.null(etastart) && is.null(start) && is.null(mustart) &&
((family$link == "inverse" && any(y == 0)) || (family$link ==
"log" && any(y <= 0))))
stop("cannot find valid starting values: please specify some")
and we see that because the starting values were not passed through, because a log link is used, and because some y
values are equal to zero, the fit fails. So this is a case that should happen if (and only if?) an offset and an intercept are both specified, a log link is used, and there are zero values in the response.
If you dump("glm",file="glmtemp.R")
; add the line
start = start[1], etastart = etastart[1], mustart = mustart[1],
to the call that fits the null deviance (i.e. the one shown above); and source("glmtemp.R")
, it seems to work OK ... I think this should be a reasonable general solution. If anyone wants to bring this issue up on the R development list, feel free.