User-specified link function in R for glm. How to? (no documentation found, what are the arguments to use, etc.)

Question

This question has already been somewhat addressed already in the past on this site, but the answers provided are not fully helpful to me. Here are the details of my questions that are actually somewhat different from what has already been discussed here:

After working hard on this, I remained unable to understand how I can define my own user-specified link function in R for glm. I have several questions on this.

First of all, I understand I have to write my own function (likely modifying one that already exists), and - in it - I need to define the following elements:

linkfun: the link function.
linkinv: the inverse of the link function, as a function of "eta".
mu.eta: the first derivative of the invlink respect to eta.
valideta: that must return TRUE if the value of eta are in the correct interval

And return all of this in a list element.

So far, so good.

Here is the first set of my questions:

The link function is sometimes defined as a function of "y" and sometimes as a function of "mu". What must be done in this respect?
Let's take an example, and type make.link("sqrt"). We then indeed discover that linkfun is sqrt(mu), linkinv is eta^2, mu.eta is 2*eta. So far, so good. However, if you look at make.link("log"), mu.eta is not simply exp(eta) as it should, but pmax(exp(eta), .Machine$double.eps) (i.e., the maximal values of the first derivative for all the eta vector). Why? I remained unable to understand this.
Just for my curiosity, why the algorithm needs the first derivative of the invlink respect to eta? This is not fully clear to me.

In my specific case, I need a quasi-logistic regression for binomial data. Instead of having a standard logit function log(p/(1-p)), I need to have the slightly modified link function (if p is defined as Y/N): log((Y+0.5)/(N-Y+0.5)).

My other question in this case is:
I remained unable to built this.. Can someone give me some hints?
Where can I find a detailed explanation of all of this? I have looked at the good old Chambers & Hastie book (1992), but the explanation is not sufficient. Are there any detailed courses available on the web, etc.?

Have a look at this link: http://stackoverflow.com/questions/15931403/modify-glm-function-to-adopt-user-specified-link-function-in-r — hlh3406, Aug 10 '15 at 11:02

thothal · Answer 1 · 2015-08-10T12:35:35.140

Not sure whether I can answer all of your questions, but I give it a try:

Can you specify a linkfun which takes mu and y? Up to my knowledge, the link function should only tkae mu as the GLM (as opposed to the LM) models a function of the expecetd value mu (aka linkfunction) instead of the expecetd value itself. Hence, there should be only mu as an argument.
This has to do with vectorization. pmax returns the parallel maxima and we want to assure that we do not report values smaller than Machine$double.eps. So the linkfun does not return the maximum of all exp(eta) (that would be max(exp(eta), .Machine$double.eps)). Imagine now that eta is now a vector of all eta for which you want to calculate then linkinv, with pmax(.) you make sure that you return exp(eta) only in these cases where it is indeed larger than .Machine$double.eps. So you return also a vector of maxima. (try pmax(1:6, 4) you will get [1] 4 4 4 4 5 6)
You need the first derivative in order to calcuate the estimator of the score function of dL / dbeta[j] = sum_i^n((y[i] - mu[i])/(a(phi[i] * V(mu[i])x[ij]/g'(mu[i]) = 0. That is the derivative of the likelihood function w.r.t. to beta[j] (i.e. dL/dbeta[j]) depends on:
- a(phi[i]) is a (known) function of the dispersion parameter coming from the respective distribution (e.g. a(phi) = phi = sigma^2 for the normal distribution)
- V(mu[i]) for distributions of the exponential family (for which the GLM was designed) you can derive that var(Y) can be written as a(phi) * V(mu) indicating that the variance is indeed a function of the mean.
- g'(mu[i]) is finally the derivative of the link function. So in order to solve the score function (thus to get estimates for beta[j], you will need the derivative of the link function
So in your case you need to define:
- the linkfun
- the inverse
- the derivative
- function to validate eta
I see your problem that you link function would also need to take y as an parameter, however, I am not sure whether the glm can deal with it, because in its fitting mechanism it will call linkfun at some point and looking at the pre-defined linkfuns, all of these require just one parameter. You could get around with that if you twist the code of glm but this will be quite some work to do (all things untested and just as food for thoughts without any guarantee that it will work):
- Provide your linkfun/linkinvers etc as something like function(mu, y) [whatever you want to have here]
- Create a copy of glm.fit (glm.fit2 say)
- Change calls fo linkfun(mu), linkinv(eta) etc to linkfun(mu, y), linkinv(eta, y) and so forth
- when you call your glm provide method = "glm.fit2" to tell glm that it should use your own fitting procedure.
The refernce book for that is McCullagh, Nelder: Generalized Linear Models. Where you find all the explanations about the exponential family of distributions, score functions etc.

You can look into function powerVarianceFamily of package EQL which also uses parameterized families for extended quasi likelihood estaimation.

Update

As just learned from the excellent answer in the previous post, no need to redefine the glm.fit as long as you use y in you linkfun, as by the time linkfun is called y should be known in the encapsualting function. So you should define linkfun like this

function(mu) [a function which uses mu and y - 
   as y is known within the context where this function is called]

Thanks for this answer. Really helpful! However, I do not understand the last part. First of all, 'linkfun' should be defined in terms of 'mu', but then you explain that is should/could also actually be defined in terms of 'y' (as this is the case here: [link](http://stackoverflow.com/questions/15931403/modify-glm-function-to-adopt-user-specified-link-function-in-r) for example). Can you please clarify this part (end your update)? Thanks, Eric. — Eric Wajnberg, Aug 10 '15 at 12:14
When I've asked for a place where a detailed explanation can be found, I was talking about explanation about how this can be done in R. The book from McCullagh & Nelder (that I know very well) is not providing any help about R.. Eric. — Eric Wajnberg, Aug 10 '15 at 12:33
Well, when I wrote my reply I was not aware how you could use `y` in your `linkfun`. However, as I read the other link I realized that you can incorporate `y` in your linkfun _without_ specifying it as an explicit argument to your linkfunction, as `y` is known within `glm.fit` where your linkfunction is called. So all you have to do is to use `y` in your funciton body, but do not include it in the argument list. — thothal, Aug 10 '15 at 12:38
If you read McCullagh you will see why you need the derivative of the link function ;) — thothal, Aug 10 '15 at 12:39
Ok, I see. Thanks in all cases for your time & help. I'll try to solve this on my side now! This surprises me a bit, however, that there is apparently nothing (web sites, books, etc.) where all of this is explained in details (i.e., how to implement this in R). Cheers, Eric. — Eric Wajnberg, Aug 11 '15 at 07:44

User-specified link function in R for glm. How to? (no documentation found, what are the arguments to use, etc.)

1 Answers1