2

I want to create lags of a variable. In a panel data setting, obviously lags are only considered within each panel.

How come that plm's lag() does not respect the panel structure (by default) and is there a way to change that (without manually dplyr it)?

# Load example data
data("EmplUK", package = "plm")
Em <- pdata.frame(EmplUK, index=c('firm', 'year'))

# how I think it should have worked
Em$lwage_incorrect = lag(Em$wage)

# what actually works
Em= Em %>%  group_by(firm) %>%  mutate(lwage_correct = lag(wage))
safex
  • 2,398
  • 17
  • 40

1 Answers1

5

When I run your code, I get panel-specific lags using both of your methods, so you might want to check it again. I have gotten into similar trouble before when I wasn't clear what lag function I was actually using (there is one in base R, one in plm, and one in dplyr, for example). Running Em$lwage = plm::lag(Em$wage) removes this ambiguity.

tifu
  • 1,352
  • 6
  • 17