5

I might not find an answer here because I don't think the revoScaleR package is widely used.

If I create a GLM using rxGlm() it works fine. However the model residuals available via rxPredict() seem to just be the "raw" residuals, ie observed value minus fitted value. The various transformed versions (deviance residuals, pearson residuals, etc.) don't seem to be available.

Does anyone know if there's a way to achieve this? I can get deviance residuals (for example) for the model by running it again using glm() (with the same formula, data, error structure, link function, weights) and using residuals(glm_object, type = "deviance"), but this is a nuisance because glm() runs very slowly (large dataset, many model parameters).

Thanks.

Edited: to include this guidance from the literature which I'm trying to follow:

enter image description here

Alan
  • 619
  • 6
  • 19
  • Could you specify which of the residual type you want (out of `deviance` (default), `pearson`, `working`, `response`, and `partial`)? I do not think that `revoScaleR` offers these options, but one could write code to compute the residuals. – broti Apr 11 '20 at 16:53
  • I *think* ideally I’d like standardised deviance residuals as a first choice. Thank you. – Alan Apr 11 '20 at 16:54
  • Do you mean as they are defined in this post: https://stats.stackexchange.com/a/99723/240805? i.e. "standardized" or "internally studentized" residuals? – broti Apr 11 '20 at 18:55
  • Perhaps...I'm afraid I'm a little rusty with the maths here. I've edited my OP to include a screenshot from a .pdf file which describes the thing I'm trying to create...is that the same thing? – Alan Apr 13 '20 at 11:54
  • Thanks for the screenshot, starting to see where you're going. To fully understand, can you say what phi, zeta and omega stand for from your book? Also, are you dealing with a binary outcome in your data? Residuals are more complicated in that case... – broti Apr 15 '20 at 07:47

1 Answers1

3

It is a bit hard to fully understand from your question what the RevoScaleR package provides in terms of residuals and which residuals exactly you need. In addition, there is quite some confusion regarding the terminology of residuals, as this exemplified for example here and here.

A few points/observations which might help you nonetheless.

In linear regression, "raw" are identical to "deviance" residuals

At least that what I take from running toy regressions with glm and predicting outcomes like:

df <- mtcars
modl <- glm(formula = mpg ~ wt + qsec + am, data = mtcars)
y_hat <- predict(modl)

Next, calculate the "raw" residuals (predicted outcome minus actual outcome) as well as deviance residuals:

y <- as.vector(df[["mpg"]])
res_raw <- y - y_hat
res_dev <- residuals(modl, type = "deviance")

These two are identical:

identical(res_raw, res_dev)
[1] TRUE

I guess it's more complicated once you get into binary outcomes etc.

Formula for computing standardized deviance residuals

Standardized deviance residuals are computed from glm with the rstandard method.

res_std <- rstandard(modl)

Looking at getAnywhere(rstandard.glm) tells you how the standardized residuals can be computed by hand from the deviance residuals:

function (model, infl = influence(model, do.coef = FALSE), type = c("deviance", 
    "pearson"), ...) 
{
    type <- match.arg(type)
    res <- switch(type, pearson = infl$pear.res, infl$dev.res)
    res <- res/sqrt(summary(model)$dispersion * (1 - infl$hat)) # this is the key line
    res[is.infinite(res)] <- NaN
    res
}

So in my example, you would manually compute standardized residuals by running res/sqrt(summary(modl)$dispersion * (1 - influence(modl)$hat)). So you need two things: hat and dispersion. I assume that RevoScaleR provides the dispersion parameter. If there is nothing in RevoScaleR like influence(modl)$hat to get the hat values, you'll have to do it from scratch:

X <- as.matrix(df[, c("wt", "qsec", "am")]) # Gets the X variables
X <- cbind(rep(1, nrow(df)), X) # adds column for the constant
hat <- diag(X %*% solve(t(X) %*% X) %*% t(X)) # formula for hat values

Now compute your standardized deviance residuals:

res_man <- res_raw/sqrt(summary(modl)$dispersion * (1 - hat))

Which are the same as derived with rstandard:

head(res_man)
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout           Valiant 
       -0.6254171        -0.4941877        -1.4885771         0.2297471         0.7217423        -1.1790097 
head(res_std)
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout           Valiant 
       -0.6254171        -0.4941877        -1.4885771         0.2297471         0.7217423        -1.1790097 
broti
  • 1,338
  • 8
  • 29