2

I have the following dataset:

share <- c(-0.16,-0.07,0,0,-0.06,-0.06,-0.18,-0.23,-0.07,-0.24,0,0,-0.22,-0.15,0,0,-0.09,-0.2,0,-0.19,0,-0.16,-0.24,-0.14,-0.22,-0.22,0,-0.18,0,0,-0.01,0,0,-0.14,0,-0.06,0,-0.12,0,0,-0.14,0,0,0,-0.02,0,0,0,0,0,-0.19,0,-0.21,-0.08,0,0,0,-0.1,-0.17,0,0,-0.13,-0.08,-0.1,0,-0.05,-0.06,0,0,-0.1,0,0,0,-0.16,0,-0.18,-0.04,-0.08,0,-0.06,0,0,0,-0.2,0,0,-0.08,0,0,-0.01,0,-0.16,-0.08,0,0,0,-0.02,-0.18,0.17,-0.2,-0.14,0,-0.24)
grade <- c(-22.64,-2.39,-15.3,-22.34,-6.39,5.25,0.47,33.61,-0.54,-25.21,16.41,-13.94,21.04,-19.64,-7.48,-32.18,-19.48,21.65,0.88,-1.6,28.49,13.61,-1.69,19.34,2.19,-7.25,6.65,7.25,20.56,16.25,17.24,12.3,3.7,-8.64,8.25,3.27,6,14.32,0.91,8.6,-6.84,13.35,6.99,3.36,0.29,19.46,2,10.75,8.76,-6.54,36.46,28.88,0.44,4.73,3.52,-10.7,-3.88,-2.36,-15.12,9.55,-4.96,10.7,6.84,-10.98,13.03,3.46,14.65,-1.66,6.89,-11.47,-6.52,15.64,3.02,4.11,-8.41,4.77,18.97,-3.73,9.54,-6.06,13.1,-24.62,-2.5,-30.26,3.82,12.97,4,2.21,14.33,12.47,-55.11,19.86,13.7,9.19,-11.01,-2.52,13.1,3.83,1.51,4.18,-3.14,-5.85,8.39)
treat <- c(0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
dataset <- as.data.frame(cbind(share, grade, treat))

I want to run the following linear model and calculate robust standard errors by using the sandwich package.

library(sandwich)
model <- lm(grade ~ treat + share + treat*share, dataset)
robSE <- sqrt(diag(vcovHC(model, type = "HC2")))

The output of the linear model is fine, but the robust standard errors are not calculated: NaN is returned for each standard error.

My guess is that the NaN is due to the variables treat and share being almost perfectly collinear (there is only one observation in which treat=1 and share!=0). The problem is that I must use these variables; I cannot replace them.

Does someone can think of a workaround/solution to this issue?

tiago
  • 331
  • 3
  • 11
  • Do you need "HC2" absolutely? HC0 and HC1 give non-NaN results (btw you might get better luck on Cross Validated, a site more stats focused than SO) – Dominic Comtois Mar 13 '15 at 08:58
  • @DominicComtois: You are right twice, HC1 works and Cross Validated would be better. I did some reading and will use HC1. I will post the rationale of such a choice as an answer to this question. Thanks a lot. – tiago Mar 13 '15 at 16:51

1 Answers1

4

After doing some reading and following @DominicComtois suggestion, I think here I will have to go with HC1. HC3, HC4 (Long and Ervin 2000, Hayes and Cai 2007), or HC4m (Cribari-Neto and Silva 2011) would be better, but they all give NaN results. The problem with HC0 is that it tends to be biased in small to moderately large samples (Hayes and Cai 2007, Cribari-Neto and Silva 2011).

References

Cribari-Neto F., da Silva W.B. (2011). A new heteroskedasticity-consistent covariance matrix estimator for the linear regression model. Adv Stat Anal 95:129–146.

Hayes, A. F., & Cai, L. (2007). Using heteroscedasticity-consis- tent standard error estimators in OLS regression: An intro- duction and software implementation. Behavior Research Methods, 39, 709–722.

Long, J.S., Ervin, L.H. (2000). Using heteroscedasticity-consistent standard errors in the linear regression model. Am. Stat. 54, 217–224.

tiago
  • 331
  • 3
  • 11