0

I am trying to replicate a Stata xtlogit re regression that is run on panel data in R. By panel data I mean that I have multiple observations for different individuals (person_id) in different years (year_id). My dependent variable (DV) is binary. I have 2 main variables of interest that I want to predict (IV1 & IV2) and a number of control variables (some_controls). In total I have about 40.000 observations.

I am a novice in using Stata, so I might just have failed to identify relevant parts of the code that feed into the xtlogit command. However, as far as I could see the relevant Stata code is as follows:

isid person_id year_id
xtset person_id year_id,y
eststo: xtlogit DV IV1 IV2 some_controls cformat(%3.2f) pformat(%3.2f) re vsquish noomitted nolog noemptycells vce(robust)

I tried replicating this in R using the following formulas:

using the "plm" package:

plm(DV ~ IV1 + IV2 + some_controls, index = c("person_id","year_id"), model ="random", data = data_frame_name)

using the lme4 package:

 glmer(DV ~ IV1 + IV2 + some_controls + (1|person_id) + (1|year_id), family = binomial, data = data_frame_name)

Unfortunately, the plm model fails to reproduce the results I get by running the Stata code. The glmer model returns the error "Error: pwrssUpdate did not converge in (maxit) iterations".

I would be thankful for suggestions on how to replicate the results calculated by the Stata code exactly.

I have found Stata's xtlogit (fe, re) equivalent in R?. However, I'm not sure how the solution to that question would be applied to panel data.

Community
  • 1
  • 1
Phil
  • 954
  • 1
  • 8
  • 22
  • 1
    Command `plm` in package `plm` is not for panel logit models. You would need to look at e.g. package `pglm` for that. – Helix123 Aug 06 '16 at 12:20
  • @Helix123 Thanks for the comment! I replicated the formula in pglm and that replicates the output correctly! If you post your comment as an answer I can mark it as "correct answer". One more question though: pglm takes about half an hour go calculate a model while Stata takes just a few seconds. Any idea why that might be? – Phil Aug 06 '16 at 13:23
  • 1
    Your data set is quite large and I believe the `pglm` procedure is not really optimised and is in pure R. In some "number crunching" packages the critical parts are written in C(++) for speed gains. In fact, the package seems to be in an "early" stage (see the version number), albeit the latest CRAN release is from 2013... Stata usually has fairly well optimised procedures (and can also benefit from pre-compiled code). – Helix123 Aug 06 '16 at 15:05
  • Not the question, but independent variables (not a good term, but I won't say more) are **used** for prediction; it's confusing, and possibly confused, to say that you want to predict them. – Nick Cox Jan 18 '17 at 13:48

1 Answers1

1

Command plm in package plm is not for panel logit models like xtlogit for Stata is. You would need to look at e.g. package pglm and function pglm in there.

Helix123
  • 3,502
  • 2
  • 16
  • 36