I was struggling with this a couple weeks ago, and while I don't consider this a definitive answer, hopefully it is still helpful. For what it's worth, McCullagh and Nelder directly acknowledge this inappropriate support of the canonical link function. They advise that one must constrain the beta
s to properly match the support. Here's the relevant passage
The canonical link function yields sufficient statistics which are linear functions of the data and it is given by η = 1/μ
. Unlike the canonical links for the Poisson and binomial distributions, the reciprocal transformation, which is often interpretable as the rate of a process, does not map the range of μ
onto the whole real line. Thus the requirement that η > 0
implies restrictions on the β
s in any linear model. Suitable precautions must be taken in computing β_hat
so that negative values of μ_hat
are avoided.
-- McCullagh and Nelder (1989). Generalized Linear Models. p. 291
It depends on your X
values, but as far as I can tell (please correct me someone!) in an MCMC-based Bayesian case, you can achieve this by either using a truncated prior on the beta
s or a strong enough prior on your intercept to make the inappropriate regions numerically impossible to reach.
In my case, I ultimately used an identity link with a strong positive prior intercept and that was sufficient and yielded reasonable results.
Also, the choice of link really depends on your X
. As the passage above implies, the use of the canonical link assumes that your linear model is in rate space. Using log or identity link functions appear to be also very common, and ultimately it's about providing a space that offers a sufficient span for the linear function to capture the response.