1

I can't seem to match the xtreg command in Stata in R without using the fe option in Stata. The coefficients are the same in Stata and R when I do a standard regression or a panel model with fixed effects.

Sample data:

library("plm" )
z <- Cigar[ Cigar$year %in% c( 63, 73) , ]

#saving so I can use in Stata
foreign::write.dta( z , "C:/Users/matthewr/Desktop/temp.dta")

So I get the same coefficient with this in R:

coef( lm( sales ~ pop , data= z2   ) )

and this in Stata

use "C:/Users/matthewr/Desktop/temp.dta" , clear   
reg sales pop

And it works when I set up a panel and used the fixed effects option.

z2 <- pdata.frame( z , index=c("state", "year")  )    
coef( plm( sales ~ pop , data= z2  , model="within"   ) ) # matches xtreg , fe

Matches this in Stata

xtset state year
xtreg sales pop, fe

I can't figure out how to match Stata when I am not using the fixed effects option I am trying to match this result in R, and can't This is the result I would like to reproduce: Coefficient:-.0006838

  xtreg sales pop
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
MatthewR
  • 2,660
  • 5
  • 26
  • 37

2 Answers2

3

Stata xtreg y x is equivalent to xtreg y x, re, so what you want is to calculate random effects.

summary(plm(sales ~ pop, data=z, model="random", index=c("state", "year")))$coe
#                  Estimate  Std. Error   z-value     Pr(>|z|)
# (Intercept)  1.311398e+02 6.499511330 20.176878 1.563130e-90
# pop         -6.837769e-04 0.001077432 -0.634636 5.256658e-01

Stata:

xtreg sales pop, re

       sales |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         pop |  -.0006838   .0010774    -0.63   0.526    -.0027955     .001428
       _cons |   131.1398   6.499511    20.18   0.000      118.401    143.8787
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Yes, this answer is absolutely correct. This did not work on my actual data, but your answer helped me figure out why. PLM and Xtreg handle unbalanced panels differently. When I subsetted both to states in all time periods the coefficients matched. Any ideas on how to handle this? @jay.sf – MatthewR Jan 27 '20 at 02:00
  • A short explanation is given in plm's vignette under the headline Unblanaced Panels: "The default employed is what the original paper for the unbalanced one-way Swamy-Arora estimator defined (in Baltagi and Chang (1994), p. 73). A more detailed analysis of Stata’s Swamy-Arora estimation procedure is given by Cottrell (2017)." – Helix123 Dec 13 '20 at 10:46
1

Your question has been answered by @jay.sf. I just add something else although it may not directly answer your question. Both Stata xtreg and R plm have a few options, I feel RStata package could be a convenient tool to try different options and to compare results from both Stata and R directly in RStudio. I thought it could be helpful. The Stata path is only for my computer.

library("plm" )
library(RStata)
data("Cigar", package = "plm")
z <- Cigar[ Cigar$year %in% c( 63, 73) , ]

options("RStata.StataPath" = "\"C:\\Program Files (x86)\\Stata14\\StataSE-64\"")
options("RStata.StataVersion" = 14)

# Stata fe 
stata_do1 <- '
  xtset state year
  xtreg sales pop, fe
'
stata(stata_do1, data.out = TRUE, data.in = z)
#> . 
#> .   xtset state year
#>        panel variable:  state (strongly balanced)
#>         time variable:  year, 63 to 73, but with gaps
#>                 delta:  1 unit
#> .   xtreg sales pop, fe
#> 
#> Fixed-effects (within) regression               Number of obs     =         92
#> Group variable: state                           Number of groups  =         46
#> 
#> R-sq:                                           Obs per group:
#>      within  = 0.0118                                         min =          2
#>      between = 0.0049                                         avg =        2.0
#>      overall = 0.0048                                         max =          2
#> 
#>                                                 F(1,45)           =       0.54
#> corr(u_i, Xb)  = -0.3405                        Prob > F          =     0.4676
#> 
#> ------------------------------------------------------------------------------
#>        sales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
#> -------------+----------------------------------------------------------------
#>          pop |  -.0032108   .0043826    -0.73   0.468    -.0120378    .0056162
#>        _cons |   141.5186   18.06909     7.83   0.000     105.1256    177.9116
#> -------------+----------------------------------------------------------------
#>      sigma_u |  34.093409
#>      sigma_e |  15.183908
#>          rho |  .83448264   (fraction of variance due to u_i)
#> ------------------------------------------------------------------------------
#> F test that all u_i=0: F(45, 45) = 8.91                      Prob > F = 0.0000

# R 
z2 <- pdata.frame( z , index=c("state", "year")  )    
coef( plm( sales ~ pop , data= z2  , model="within" ) )
#>          pop 
#> -0.003210817

# Stata re
stata_do2 <- '
  xtset state year
  xtreg sales pop, re
'
stata(stata_do2, data.out = TRUE, data.in = z)
#> . 
#> .   xtset state year
#>        panel variable:  state (strongly balanced)
#>         time variable:  year, 63 to 73, but with gaps
#>                 delta:  1 unit
#> .   xtreg sales pop, re
#> 
#> Random-effects GLS regression                   Number of obs     =         92
#> Group variable: state                           Number of groups  =         46
#> 
#> R-sq:                                           Obs per group:
#>      within  = 0.0118                                         min =          2
#>      between = 0.0049                                         avg =        2.0
#>      overall = 0.0048                                         max =          2
#> 
#>                                                 Wald chi2(1)      =       0.40
#> corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.5257
#> 
#> ------------------------------------------------------------------------------
#>        sales |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
#> -------------+----------------------------------------------------------------
#>          pop |  -.0006838   .0010774    -0.63   0.526    -.0027955     .001428
#>        _cons |   131.1398   6.499511    20.18   0.000      118.401    143.8787
#> -------------+----------------------------------------------------------------
#>      sigma_u |  30.573218
#>      sigma_e |  15.183908
#>          rho |  .80214841   (fraction of variance due to u_i)
#> ------------------------------------------------------------------------------

# R random
coef(plm(sales ~ pop, 
            data=z, 
            model="random", 
            index=c("state", "year")))
#>   (Intercept)           pop 
#>  1.311398e+02 -6.837769e-04

Created on 2020-01-27 by the reprex package (v0.3.0)

Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27