1

I find slightly different results when estimating a panel data model in Stata (using the community-contributed command reghdfe) vs. R.

Stata:

cls
webuse nlswork, clear
xtset idcode year
reghdfe ln_w grade age ttl_exp tenure not_smsa south, abs(year)  cluster(idcode)

R:

## import data
library(foreign)   
df = read_dta("http://www.stata-press.com/data/r14/nlswork.dta")

## estimate the model
model5 = plm( ln_wage ~   grade + age + ttl_exp + tenure+  not_smsa  + south + as.factor(year), data=df, index=c('idcode', 'year'), model="random")
summary(model5)[1:7,1:4]  #  <- this gives unclustered errors
coeftest(model5, vcov=vcovHC(model5,type="HC0",cluster="group"))[1:7,1:4] # <- this gives clustered errors

I would have expected the same coefficients (standard errors still need Degrees-of-freedom correction as well I guess). What am I missing?

safex
  • 2,398
  • 17
  • 40
  • 1
    See this [blog site](http://www.matthieugomez.com/statar/regressions.html) of R and Stata modeling comparison. Do note: you are not using `xtreg` but [`reghdfe`](http://scorreia.com/software/reghdfe/), a 3rd party package which is not standard panel estimation but applies various algorithms which can underpin the differences. – Parfait Dec 06 '18 at 17:45

1 Answers1

2

After tweaking a bit, I find that R's plm package can use multiple fixed effects (at least on both index levels)

## estimate the model
model5 = plm( ln_wage ~   grade + age + ttl_exp + tenure+    not_smsa  + south + as.factor(year), data=df, index=c('idcode',  'year'), model="with", effect="time")
summary(model5)[1:7,1:4]  #  <- this gives unclustered errors
coeftest(model5, vcov=vcovHC(model5,type="HC0",cluster="group"))   [1:7,1:4] # <- this gives clustered errors

The above equals time fixed effects and numerically resembles Statas reghdfe command

 reghdfe ln_w grade age ttl_exp tenure not_smsa south, abs(year)  cluster(idcode)

Similarly, if you wanted both fixed effects where in Stata you would:

 reghdfe ln_w grade age ttl_exp tenure not_smsa south, abs(idcode year)  cluster(idcode) 

in R you can use:

model5 = plm( ln_wage ~   grade + age + ttl_exp + tenure+    not_smsa  + south + as.factor(year), data=df, index=c('idcode',  'year'), model="with", effect="twoways")
summary(model5)  #  <- this gives unclustered errors
coeftest(model5, vcov=vcovHC(model5,type="HC0",cluster="group"))    # <- this gives clustered errors
safex
  • 2,398
  • 17
  • 40