3

I'm teaching a modeling class in R. The students are all SAS users, and I have to create course materials that exactly match (when possible) SAS output. I'm working on the Poisson regression section and trying to match PROC GENMOD, with a "dscale" option that modifies the dispersion index so that the deviance/df==1.

Easy enough to do, but I need confidence intervals. I'd like to show the students how to do it without hand calculating them. Something akin to confint_default() or confint()

Data

skin_cancer <- data.frame(CASES=c(1,16,30,71,102,130,133,40,4,38,
                              119,221,259,310,226,65),
                      CITY=c(rep(0,8),rep(1,8)),
                      N=c(172875, 123065,96216,92051,72159,54722,
                          32185,8328,181343,146207,121374,111353,
                          83004,55932,29007,7583),
                      agegp=c(1:8,1:8))
skin_cancer$ln_n = log(skin_cancer$N)

The model

fit <- glm(CASES ~ CITY, family="poisson", offset=ln_n, data=skin_cancer)

Changing the dispersion index

summary(fit, dispersion= deviance(fit) / df.residual(fit)))

That gets me the "correct" standard errors (correct according to SAS). But obviously I can't run confint() on a summary() object.

Any ideas? Bonus points if you can tell me how to change the dispersion index within the model so I don't have to do it within the summary() call.

Thanks.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Brian Carter
  • 103
  • 5

1 Answers1

1

This is an interesting question, and slightly deeper than it seems.

The simplest potential answer is to use family="quasipoisson" instead of poisson:

fitQ <- update(fit, family="quasipoisson")
confint(fitQ)

However, this won't let you adjust the dispersion to be whatever you want; it specifically changes the dispersion to the estimate R calculates in summary.glm, which is based on the Pearson chi-squared (sum of squared Pearson residuals) rather than the deviance, i.e.

sum((object$weights * object$residuals^2)[object$weights > 0])/df.r

You should be aware that stats:::confint.glm() (which actually uses MASS:::confint.glm) computes profile confidence intervals rather than Wald confidence intervals (i.e., this is not just a matter of adjusting the standard deviations).

If you're satisfied with Wald confidence intervals (which are generally less accurate) you could hack stats::confint.default() as follows (note that the dispersion title is a little bit misleading, as this function basically assumes that the original dispersion of the model is fixed to 1: this won't work as expected if you use a model that estimates dispersion).

confint_wald_glm <- function(object, parm, level=0.95, dispersion=NULL) {
    cf <- coef(object)
    pnames <- names(cf)
    if (missing(parm)) 
      parm <- pnames
    else if (is.numeric(parm)) 
      parm <- pnames[parm]
    a <- (1 - level)/2
    a <- c(a, 1 - a)
    pct <- stats:::format.perc(a, 3)
    fac <- qnorm(a)
    ci <- array(NA, dim = c(length(parm), 2L), dimnames = list(parm, 
                                                               pct))
    ses <- sqrt(diag(vcov(object)))[parm]
    if (!is.null(dispersion)) ses <- sqrt(dispersion)*ses
    ci[] <- cf[parm] + ses %o% fac
    ci
}

confint_wald_glm(fit)
confint_wald_glm(fit,dispersion=2)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453