0

I am now trying to convert stata to R and I have the following codes in stata:

tsset code_numeric year_numeric
sort code_numeric year_numeric
tab year, gen (yr)
tab code, gen(cd)
reg fhpolrigaug L.(fhpolrigaug lrgdpch) yr* cd* if sample==1, cluster(code)

Does anyone know which syntax should I use in R to get the same result as in stata? To be concrete, I mean the tab year, gen(yr), tab code, gen(cd) and reg fhpolrigaug L.(fhpolrigaug lrgdpch) yr* cd* if sample==1, cluster(code) parts. The dataset I used is in the link: https://www.openicpsr.org/openicpsr/project/113251/version/V1/view

w12345678
  • 63
  • 6
  • 2
    This can be separated into two. `tabulate` with `generate()` option (a) shows a table of frequencies of the distinct values of the variable specified (`year`, whatever) (b) produces indicator variables for said distinct values. Even in Stata (b) is historic as you can signal to most model fitting commands that a variable is to be treated as a series of indicators, and that is (I am not a R person) as I understand it utterly standard in R too. (a) might be what you seek any way, but I guess there are many ways to do it in R. – Nick Cox Jun 09 '20 at 13:16

1 Answers1

2

I guess this is more or less what you want to achieve:

setwd("your_directory/")
library(plm)
library(readxl)
df <-read_excel("Income-and-Democracy-Data-AER-adjustment.xls", sheet = "10 Year Panel")

df <- pdata.frame(df, index = c("code", "year"), drop.index = FALSE)

#produce lagged variables if needed:
#df$fhpolrigaug_lag <- lag(df$fhpolrigaug)
#df$lrgdpch_lag <- lag(df$lrgdpch)


regModel <- plm(fhpolrigaug ~ lag(fhpolrigaug) + lag(lrgdpch),
          data = subset(df, sample==1) , index = c("code","year"), model = "within", effect = "twoways")
summary(regModel)

#for more info, look here: https://philippbroniecki.com/statistics1/seminar10.html

There is one thing left; you will have to find out on your own how to cluster the standard errors.

PS: Here is a nice side-by-side comparison of R and Stata commands: http://rslblissett.com/wp-content/uploads/2016/09/sidebyside_130826.pdf

glaucon
  • 190
  • 1
  • 9
  • Thank you for the pdf file! I have already tried your codes but the result of coefficients is different to the stata one... – w12345678 Jun 10 '20 at 10:28
  • I am also a native Stata user who had to work only sporadically with R. In general, the code should replicate you results as including yr and cd dummies in the regression is the same as including year and code fixed effects in plm. Probably, I misspecified the model. I will leave you the task of finding the correct model specification. – glaucon Jun 10 '20 at 11:12
  • For more information the regression model is shown as: `d_it = alpha * lag(d_it) + gamma * lag(y_it) + mu_t + delta_i + u_it`, where d_it is measured by fhpolrigaug; y_it by lrgdpch; mu_t and delta_i are indicator variables(yr* and cd*); u_it is error term, which is i.i.d. and normal distributed. Can it also be reached by R? Thank you – w12345678 Jun 10 '20 at 12:49
  • Of course, this can be reached with R. Please have a look at https://philippbroniecki.com/statistics1/seminar10.html for more details on how to run a fixed effects regression with panel data using R. – glaucon Jun 10 '20 at 14:47
  • Thank you so much! – w12345678 Jun 12 '20 at 22:40