This can happen when coefficients result in a positive linear combination, which in log-binomial regression means P(success) > 1. Supplying different starting values can help avoid this situation.
The short answer is to add start=coefini
to the glm
function. Where coefini
is a vector whose length is the number of model parameters plus 1 (for the intercept). So your function becomes
glm(y ~ a + b + c + d, data = data,
family = binomial(link="log"), start=coefini)
But choosing which values to supply can be challenging.
The question/answer in this other community was useful to me in computing starting values for this kind of analysis.
That question is about the starting values for LOGISTIC regression, and its answer is that "you can get them using [linear regression] by regressing the logit of the response, y, on the predictors with weight ny(1-y)"
In that question, the binary (1/0) dependent variable is remiss
. And the answer provides these steps to obtain starting values for a logistic regression:
y=.1*(remiss=0)+.9*(remiss=1)
logit=log(y/(1-y))
wt=y*(1-y)
Then the starting values come from the weighted linear regression of logit
with the predictors of interest.
But I adjusted a couple of things for link="log" instead of "logit": Instead of
logit = log(y/(1-y))
wt = y*(1-y)
I used
depvar = log(y)
wt = y/(1-y)
Then the starting values for the log-binomial model are the results of the weighted linear regression of depvar
with the same predictors. If these results are stored in a vector called coefini
, then just add start=coefini
to your glm command.
I am more into Python than R these days, but I believe with your example, this means doing the following:
data['y0'] <- 0.1 * (data['y'] == 0) + 0.9 * (data['y'] == 1)
data['depvar'] <- log(data['y0'])
data['wt'] <- data['y0'] / (1 - data['y0'])
coefini <- coef(glm(formula = depvar ~ a + b + c + d, weights=wt, data = data))
glm(y ~ a + b + c + d, data = data,
family = binomial(link="log"), start=coefini)