In general, I think the strategy here would be to do the following:
Create a variable containing the individual-level mean for the predictor variable. This is most easily accomplished using dplyr
:
data <- data $>$ group_by(ID) %>% mutate(X_mean = mean(X))
The magic here is with the group_by
function, which causes the mean
operation to calculate group means rather than global means.
Use lme4
to estimate the logit model as a multilevel model. Here's how I'd specify the model:
glmer(Y ~ X + X_mean + Time + (1 | ID), family = binomial)
The terms "fixed" and "random" are really muddled between the panel data, multilevel modeling, and some other literatures, so I'm not completely clear on how you conceptualize "fixed effect of time". What this model gives you is a fixed effect of X
in that the coefficient for X
will represent the within-subjects effect of X
. I include Time
as a predictor which will either treat the year as an additional predictor whose interpretation depends on whether it is continuous or categorical. Some would fit that as a "random" effect (as in random slope or in some literatures a "growth curve"). You would do that with:
glmer(Y ~ X + X_mean + Time + (Time | ID), family = binomial)
Which estimates a different effect of time for every individual.
The (1 | ID)
in the first model and (Time | ID)
in the second model tells lme4
what the grouping variable is, which in your case is ID
. You get random intercepts by ID
in the first model and a random intercept plus random slope for Time
in the second model. Another interpretation of your first post would be that you want a random intercept for Time
as well, in which case you could do the following:
glmer(Y ~ X + X_mean + (1 | Time) + (1 | ID), family = binomial)
or, alternatively, if there are few waves you can get to the same place by including Time
as a predictor and making that variable a factor in your input data. If there are many time points that could make the output unwieldy.
I've been working on a package to automate some of this, inspired by the xt
suite from Stata, though at this juncture my package is far more limited. It's called panelr
and at present must be downloaded from GitHub. More info available here: https://github.com/jacob-long/panelr
In this case, using panelr
, your situation would work like this:
library(panelr)
pdata <- panel_data(data, id = ID, wave = Time)
model <- wbm(Y ~ X, data = pdata, use.wave = TRUE, family = binomial)
All panelr
is doing is automating what I've explained above. You can drop the individual mean variable without affecting the estimate of the within-subject effect of X
by using the model = "within"
argument.
panelr
is probably a few weeks away from CRAN submission at this point as a few things need documenting, there are a few edge cases where things break unexpectedly, and I want to be more flexible about the handling of time.