3

I am having trouble making a plot of standarised residuals versus a covariate match a plot shown in Pinhiero and Bates Mixed-Effects Models in S and S-Plus. The model being plotted is a general formulation of a nonlinear mixed-effects model, contained in the nlme package

library(nlme)
options(contrasts = c("contr.helmert", "contr.poly"))
fm1Dial.gnls <- gnls(rate ~ SSasympOff(pressure, Asym, lrc, c0),
                     data = Dialyzer,
                     params = list(Asym + lrc ~ QB, c0 ~ 1),
                     start = c(53.6, 8.6, 0.51, -0.26, 0.225))

When we plot the standardised residuals versus the transmembrane pressure in this model

plot(fm1Dial.gnls, resid(.) ~ pressure, abline = 0)

The resulting plot shows evidence of heteroscedasticity across different pressures. Thus we fit a new model with a power variance function to account for this.

fm2Dial.gnls <- update(fm1Dial.gnls, weights = varPower(form = ~ pressure))

Which is clearly superior to the first model

anova(fm1Dial.gnls, fm2Dial.gnls)

However when we plot the standarised residuals versus transmembrane pressure for the new improved model

plot(fm2Dial.gnls, resid(.) ~ pressure, abline = 0)

The plot doesn't look like much of an improvement on the first plot, with the vertical spread of the residuals still appearing to be much higher at higher pressures.

The plot for the second improved model in Pinhiero and Bates, however. shows a similar vertical spread of residuals at all levels of pressure, which makes sense given that the heteroscedasticity has explicitly been accounted for in this improved model.

What am I doing wrong?

llewmills
  • 2,959
  • 3
  • 31
  • 58
  • I suspect you'll get better help at https://stats.stackexchange.com/ If you don't get an answer here, consider migrating the question to there – dww Dec 30 '18 at 06:22
  • Thanks @dww but it seemed to be more of a software syntax issue than a pure stats issue. – llewmills Dec 30 '18 at 09:45
  • It seems like the graphs match the book better if one specifies the normalised residuals via `plot(fm2Dial.gnls, resid(., type = "n) ~ pressure, abline = 0)`, however the book specifically refers to 'standardised residuals'. So I am still unsure what is going wrong with the second plot in my example, and why the standardised residuals vs pressure plots don't seem to change despite the introduction of a variance function to the model. – llewmills Dec 30 '18 at 10:02
  • Actually `plot(fm2Dial.gnls, resid(., type = "p") ~ pressure, abline = 0)` also produces plots that match the book. The `type = "p"` argument in `plot.lme` indicates pearson's standardised residuals. Which begs the question what the default is, if not the standardised residuals? – llewmills Dec 30 '18 at 10:22
  • 1
    re your comment ^^; if you look at the code for `nlme:::residuals.gnls`, if `type` is unspecified than `match.arg` grabs the first value, hence the default are the raw residuals. (that's not to say that this was the same when the book was written). (ps: from `?nlme:::residuals.gnls` Pearson == standardised) – user20650 Dec 30 '18 at 12:35

1 Answers1

1

Where you were wrong is saying that

plot(fm2Dial.gnls, resid(.) ~ pressure, abline = 0)

are standardized residuals while they indeed are not. You correctly found that

plot(fm2Dial.gnls, resid(., type = "p") ~ pressure, abline = 0)

or, more completely,

plot(fm2Dial.gnls, resid(., type = "pearson") ~ pressure, abline = 0)

gives the same plot as in the book and those are standardized residuals.

?residuals.gnls explains quite a bit:

type --- an optional character string specifying the type of residuals to be used. If "response", the "raw" residuals (observed - fitted) are used; else, if "pearson", the standardized residuals (raw residuals divided by the corresponding standard errors) are used; else, if "normalized", the normalized residuals (standardized residuals pre-multiplied by the inverse square-root factor of the estimated error correlation matrix) are used. Partial matching of arguments is used, so only the first character needs to be provided. Defaults to "response".

From this description we also see why choosing type as "normalized" and "pearson" gives the same result: the former option would take into account the dependence structure of the error, but since we only relaxed the homoskedasticity assumption, we still have no dependence. That is also evident in nlme:::residuals.gnls in

if (type != "response") {
    val <- val/attr(val, "std")
    lab <- "Standardized residuals"
    if (type == "normalized") {
        if (!is.null(cSt <- object$modelStruct$corStruct)) {
            val <- recalc(cSt, list(Xy = as.matrix(val)))$Xy[, 
              1]
            lab <- "Normalized residuals"
        }
    }
}
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Thanks again @Julius Vainora, so the default are raw residuals. I was looking in the help of `plot.lme` not `residuals.lme`. Why do these raw residuals show little change even after heteroscedasticity is built into the model, but the pearson and normalisresiduals *do* show change? – llewmills Dec 30 '18 at 20:18
  • @llewmills, 1) with `weights = varPower(form = ~ pressure)` we only model weights (or standard deviations) for each observation; we don't add any new regressors to the equation. Given those weights, we then indeed obtain new estimates and new raw residuals. In general, those estimates and raw residuals (under the two models) don't have to be similar. But they do happen to be similar in this case; we can think that negative and positive deviations due to the weights somewhat cancel out.. Or perhaps that `SSasympOff` cannot utilize this information as much as we would like it to. Hard to say. – Julius Vainora Dec 30 '18 at 23:51
  • 2) as I said, we don't add any new terms to the equation, just try to explain the structure of those raw residuals. In the first model we say that the variance is constant, so dividing by the standard deviation (a constant) just rescales everything. Now in the second model we have similar raw residuals, but now we believe that their scale isn't constant and depends on `pressure`. We try to estimate this scale (with `varPower`) and now divide raw residuals by a different standard deviation for each observation. Hopefully this helps. I also answered https://stackoverflow.com/q/52884290/1320535. – Julius Vainora Dec 30 '18 at 23:59