Different Robust Standard Errors of Poisson regression in Stata and R

Question

i am not getting the same results by following Hilbe's, J. 2011 procedure (herein referred to as "the book") on page 20. The procedure is for computing Poisson regression with robust standard errors using the titanic data set in glm R. Hilbe's source code is in Table 2.4 according to the following link: Negative Binomial Regression Second edition Errata 2012

I believe Some changes to the titanic dataset has occurred since it was published here is the procedure and Stata results from the book and what was performed, which resulted in incorrect or different results in R as of july 2021:

library(COUNT)
data("titanic")
attach(titanic)
library(gmodels)

str(titanic)
'data.frame':   1316 obs. of  4 variables:
 $ class   : Factor w/ 3 levels "3rd class","1st class",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ age     : Factor w/ 2 levels "child","adults": 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "label")= chr "0=child; 1=adult"
  ..- attr(*, "format")= chr "%10.0g"
  ..- attr(*, "value.label.table")= Named int  0 1
  .. ..- attr(*, "names")= chr  "child" "adults"
 $ sex     : Factor w/ 2 levels "women","man": 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "label")= chr "gender: 0=female; 1=male"
  ..- attr(*, "format")= chr "%8.0g"
  ..- attr(*, "value.label.table")= Named int  0 1
  .. ..- attr(*, "names")= chr  "women" "man"
 $ survived: num  2 2 2 2 2 2 2 2 2 2 ...
 - attr(*, "stata.info")=List of 5
  ..$ datalabel  : chr "Hilbe, Modeling Count Data (CUP, 2014)"
  ..$ version    : int 12
  ..$ time.stamp : chr "14 Jul 2014 15:12"
  ..$ val.labels : chr  "class" "age" "sex" "survived"
  ..$ label.table:List of 4
  .. ..$ class   : Named int  1 2 3 4
  .. .. ..- attr(*, "names")= chr  "1st class" "2nd class" "3rd class" "crew"
  .. ..$ age     : Named int  0 1
  .. .. ..- attr(*, "names")= chr  "child" "adults"
  .. ..$ sex     : Named int  0 1
  .. .. ..- attr(*, "names")= chr  "women" "man"
  .. ..$ survived: Named int  0 1
  .. .. ..- attr(*, "names")= chr  "no" "yes"

the book relevels the class.

titanic$class <- relevel(factor(titanic$class), ref=3)

however, as of 2021 "survive" has become a factor opposed to what i believe used to be a binary 0="no" and 1="yes" integers, therefore, the survive is recoded accordingly

titanic$survived <- as.character(titanic$survived)
titanic$survived [which(titanic$survived =="no")] <- "0"
titanic$survived [which(titanic$survived =="yes")] <- "1"
titanic$survived <- as.integer(titanic$survived)

Code from the 2012 errata:

tit3 <- glm(survived ~ factor(class), family=poisson, data=titanic)
irr <- exp(coef(tit3)) # vector of IRRs
library("sandwich")
rse <- sqrt(diag(vcovHC(tit3, type="HC0"))) # coef robust SEs
irr*rse # IRR robust SEs

irr*rse output in R console

 (Intercept) factor(class)1st class factor(class)2nd class 
            0.01634255             0.19270871             0.15723303

using summary function

> summary(tit3)

Call:
glm(formula = survived ~ factor(class), family = poisson, data = titanic)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1177  -0.7101  -0.7101   0.4364   1.1225  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)            -1.37783    0.07495 -18.384  < 2e-16 ***
factor(class)1st class  0.90721    0.10268   8.835  < 2e-16 ***
factor(class)2nd class  0.49603    0.11871   4.179 2.93e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 967.81  on 1315  degrees of freedom
Residual deviance: 889.69  on 1313  degrees of freedom
AIC: 1893.7

Number of Fisher Scoring iterations: 5

The following is believed to be the correct estimates because its what is n the book. The Incident Rate Ratio (IRR)is :

class2: 1.642184 
class1: 2.477407

and estimates and Robust Std. Err.

class2: 0.1572928
class1: 0.192782

all have P>|z| == 0. Can someone confirm this? Thank you

score 0 · Answer 1 · answered Jul 06 '21 at 04:07

0

Confirmed!

data('titanic', package="COUNT")
titanic <- transform(titanic, survived=as.numeric(survived) - 1,
                     class=relevel(class, ref=3))

tit3 <- glm(survived ~ class, family=poisson, data=titanic)

library(sandwich);library(lmtest)
(smy <- coeftest(tit3, vcovHC(tit3, type="HC0")))
# z test of coefficients:
#   
#                 Estimate Std. Error  z value  Pr(>|z|)    
# (Intercept)    -1.377832   0.064819 -21.2565 < 2.2e-16 ***
# class1st class  0.907212   0.077786  11.6629 < 2.2e-16 ***
# class2nd class  0.496027   0.095746   5.1806 2.211e-07 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(irr <- exp(coef(tit3)))
# (Intercept) class1st class class2nd class 
#   0.2521246      2.4774071      1.6421841 

rse <- sqrt(diag(vcovHC(tit3, type="HC0")))
irr*rse
# (Intercept) class1st class class2nd class 
#  0.01634255     0.19270871     0.15723303

answered Jul 06 '21 at 04:07

jay.sf

60,139
8
53
110

i can see the mistake i made reading the output. thanks for simplifying the code. if you will, i have a follow up question. is the transform function in base R and what does the '-1' mean? – cn838 Jul 06 '21 at 18:10
@Chris.530 Sure base R. `as.numeric(survived)` gives the underlying integers of the factor (which is in fact labeled integers) minus one gives accidently easily the wanted no=0 yes=1 structure. – jay.sf Jul 06 '21 at 18:16
is preordained to two binary terms or characters like 'a' and 'b' are '0' and '1' respectively. I could not find documentation on this command in help(transform) in R. – cn838 Jul 06 '21 at 18:33
`transform` transforms just the data frame. Consider `?factor`, the warning at the bottom, but that's exactly what we want in this case. Or try `f <- factor(c(2, 1, 2, 2, 1), levels=1:2, labels=c('no', 'yes'));as.numeric(f) - 1` - that's how `survived` was constructed. – jay.sf Jul 06 '21 at 18:41
Hi Jay, thanks for responding. Should i repost the question. I believe the issue is that upon copying code from a book or some other medium and entering it through mark down (ie the green button) the code is executed without showing the intermediary commands (or in this case computations 'irr') which functions a little different from what one would expect from a calculator. – cn838 Jul 08 '21 at 13:55

Different Robust Standard Errors of Poisson regression in Stata and R

1 Answers1