4

I'm trying to figure out how to perform a fixed effect logit regression in R (analogously to Stata's xtlogit command). I read of several packages such as "pglm" or "bife" but couldn't get my model to run.

My data is saved as a data frame and looks like:

ID   Time  Y  X
1    2000  1  0
1    2001  0  1
1    2002  1  1
...
1    2016  1  0
...
n

Basically I would like to run the fixed effects logit regression:

 y_jt = beta*x_jt + mu_j + pi_t + epsilon_jt

where j is the ID, t is the Time, mu the ID fixed effects, pi the time fixed effects, and epsilon the error term.

I'm open to using any package. I started with "bife" but could not figure out how to set both ID and Time fixed effects. I tried:

 mod.no <- bife(y ~ x | ID, data = panel)

Do I maybe need to set the data as a panel like Stata's "xtset" command?

Many thanks in advance!

EDIT

The Stata command I would like to replicate in R is:

xi: xtlogit Y X i.Time, fe
Neicooo
  • 197
  • 1
  • 9

2 Answers2

3

In general, I think the strategy here would be to do the following:

  1. Create a variable containing the individual-level mean for the predictor variable. This is most easily accomplished using dplyr:

    data <- data $>$ group_by(ID) %>% mutate(X_mean = mean(X))

The magic here is with the group_by function, which causes the mean operation to calculate group means rather than global means.

  1. Use lme4 to estimate the logit model as a multilevel model. Here's how I'd specify the model:

    glmer(Y ~ X + X_mean + Time + (1 | ID), family = binomial)

The terms "fixed" and "random" are really muddled between the panel data, multilevel modeling, and some other literatures, so I'm not completely clear on how you conceptualize "fixed effect of time". What this model gives you is a fixed effect of X in that the coefficient for X will represent the within-subjects effect of X. I include Time as a predictor which will either treat the year as an additional predictor whose interpretation depends on whether it is continuous or categorical. Some would fit that as a "random" effect (as in random slope or in some literatures a "growth curve"). You would do that with:

glmer(Y ~ X + X_mean + Time + (Time | ID), family = binomial)

Which estimates a different effect of time for every individual.

The (1 | ID) in the first model and (Time | ID) in the second model tells lme4 what the grouping variable is, which in your case is ID. You get random intercepts by ID in the first model and a random intercept plus random slope for Time in the second model. Another interpretation of your first post would be that you want a random intercept for Time as well, in which case you could do the following:

glmer(Y ~ X + X_mean + (1 | Time) + (1 | ID), family = binomial)

or, alternatively, if there are few waves you can get to the same place by including Time as a predictor and making that variable a factor in your input data. If there are many time points that could make the output unwieldy.

I've been working on a package to automate some of this, inspired by the xt suite from Stata, though at this juncture my package is far more limited. It's called panelr and at present must be downloaded from GitHub. More info available here: https://github.com/jacob-long/panelr

In this case, using panelr, your situation would work like this:

library(panelr)
pdata <- panel_data(data, id = ID, wave = Time)
model <- wbm(Y ~ X, data = pdata, use.wave = TRUE, family = binomial)

All panelr is doing is automating what I've explained above. You can drop the individual mean variable without affecting the estimate of the within-subject effect of X by using the model = "within" argument.

panelr is probably a few weeks away from CRAN submission at this point as a few things need documenting, there are a few edge cases where things break unexpectedly, and I want to be more flexible about the handling of time.

commscho
  • 380
  • 2
  • 7
  • Thank you very much. I will try it out. It seems though rather complicated to me. Is there no way how to do this with the pglm or bife package? – Neicooo Feb 21 '18 at 20:25
1

Perhaps try coercing the Time variable to a factor, like you are doing in Stata, and including it in the bife formula:

panel$Time <– as.factor(panel$Time)

mod.no <- bife(y ~ x + Time | ID, data = panel, bias_corr = "ana")

Note: You will probably want to correct for the incidental parameter bias with the bias_corr = "ana" at the end.

Alternatively, using the clogit function in the survival package should work too:

mod.no <– clogit(y ~ x + strata(Time) + strata(ID), data = panel

The strata() option indicates the fixed-effects.

nrjenkins
  • 13
  • 3