It is a bit hard to fully understand from your question what the RevoScaleR
package provides in terms of residuals and which residuals exactly you need. In addition, there is quite some confusion regarding the terminology of residuals, as this exemplified for example here and here.
A few points/observations which might help you nonetheless.
In linear regression, "raw" are identical to "deviance" residuals
At least that what I take from running toy regressions with glm
and predicting outcomes like:
df <- mtcars
modl <- glm(formula = mpg ~ wt + qsec + am, data = mtcars)
y_hat <- predict(modl)
Next, calculate the "raw" residuals (predicted outcome minus actual outcome) as well as deviance residuals:
y <- as.vector(df[["mpg"]])
res_raw <- y - y_hat
res_dev <- residuals(modl, type = "deviance")
These two are identical:
identical(res_raw, res_dev)
[1] TRUE
I guess it's more complicated once you get into binary outcomes etc.
Formula for computing standardized deviance residuals
Standardized deviance residuals are computed from glm
with the rstandard
method.
res_std <- rstandard(modl)
Looking at getAnywhere(rstandard.glm)
tells you how the standardized residuals can be computed by hand from the deviance residuals:
function (model, infl = influence(model, do.coef = FALSE), type = c("deviance",
"pearson"), ...)
{
type <- match.arg(type)
res <- switch(type, pearson = infl$pear.res, infl$dev.res)
res <- res/sqrt(summary(model)$dispersion * (1 - infl$hat)) # this is the key line
res[is.infinite(res)] <- NaN
res
}
So in my example, you would manually compute standardized residuals by running res/sqrt(summary(modl)$dispersion * (1 - influence(modl)$hat))
. So you need two things: hat
and dispersion
. I assume that RevoScaleR
provides the dispersion parameter. If there is nothing in RevoScaleR
like influence(modl)$hat
to get the hat values, you'll have to do it from scratch:
X <- as.matrix(df[, c("wt", "qsec", "am")]) # Gets the X variables
X <- cbind(rep(1, nrow(df)), X) # adds column for the constant
hat <- diag(X %*% solve(t(X) %*% X) %*% t(X)) # formula for hat values
Now compute your standardized deviance residuals:
res_man <- res_raw/sqrt(summary(modl)$dispersion * (1 - hat))
Which are the same as derived with rstandard
:
head(res_man)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
-0.6254171 -0.4941877 -1.4885771 0.2297471 0.7217423 -1.1790097
head(res_std)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
-0.6254171 -0.4941877 -1.4885771 0.2297471 0.7217423 -1.1790097