1

I have some panel data that looks like this (code to enter my dataset is at the end):

  countrycode year X
1         ARG 2015 2
2         ARG 2016 2
3         ARG 2017 1
4         AUS 2015 1
5         AUS 2016 3
6         AUS 2017 2
7         USA 2015 6
8         USA 2016 5
9         USA 2017 8

And I'd like to difference the X variable (i.e. subtract last year's X from this year's X). It works perfectly when I don't use pipes:

library(tidyverse)
library(plm)

pdf <- pdata.frame(df, index = c("countrycode", "year"))

# This works perfectly
pdf <- mutate(pdf, dX = pdf$X - lag(pdf$X))

The results are exactly what I'd want: every 2015 value of dX is NA, because there is no 2014 value of X to compare with.

  countrycode year X dX
1         ARG 2015 2 NA
2         ARG 2016 2  0
3         ARG 2017 1 -1
4         AUS 2015 1 NA
5         AUS 2016 3  2
6         AUS 2017 2 -1
7         USA 2015 6 NA
8         USA 2016 5 -1
9         USA 2017 8  3

But when I try to use %>% :

pdf <- pdf %>% mutate(dX2 = X - lag(X))

the results no longer take into account the panel structure. See how dX2 tries to difference right across countries? So dX2 for USA in 2015 should be NA, but instead it's 4.

  countrycode year X dX dX2
1         ARG 2015 2 NA  NA
2         ARG 2016 2  0   0
3         ARG 2017 1 -1  -1
4         AUS 2015 1 NA   0
5         AUS 2016 3  2   2
6         AUS 2017 2 -1  -1
7         USA 2015 6 NA   4
8         USA 2016 5 -1  -1
9         USA 2017 8  3   3

Is there some way to use pipes in plm or with panel data?

Full code here:

library(tidyverse)
library(plm)

df <- data.frame(stringsAsFactors=FALSE,
   countrycode = c("ARG", "ARG", "ARG", "AUS", "AUS", "AUS", "USA", "USA",
                   "USA"),
          year = c(2015L, 2016L, 2017L, 2015L, 2016L, 2017L, 2015L, 2016L,
                   2017L),
             X = c(2L, 2L, 1L, 1L, 3L, 2L, 6L, 5L, 8L)
)
df

# using panel
pdf <- pdata.frame(df, index = c("countrycode", "year"))

# This works perfectly
pdf <- mutate(pdf, dX = pdf$X - lag(pdf$X))
pdf

# Pipe doesn't work across the panel
pdf <- pdf %>% mutate(dX2 = X - lag(X))
pdf
Jeremy K.
  • 1,710
  • 14
  • 35

2 Answers2

2

You need to specify that you are using lag from dplyr (and not plm).

pdf <- pdf %>% 
  group_by(countrycode) %>%
  mutate(dX2 = X - dplyr::lag(X))

Results:

  countrycode year X dX dX2
1         ARG 2015 2 NA  NA
2         ARG 2016 2  0   0
3         ARG 2017 1 -1  -1
4         AUS 2015 1 NA   NA
5         AUS 2016 3  2   2
6         AUS 2017 2 -1  -1
7         USA 2015 6 NA   NA
8         USA 2016 5 -1  -1
9         USA 2017 8  3   3
Randall Helms
  • 849
  • 5
  • 15
  • Your code gives the same output as mine above (my desired outcome is for dX2 to match dX by having NA in it). I've tried `mutate(dX2 = X - dplyr::lag(X))` and tried `mutate(dX2 = X - plm::lag(X))`, but neither gives the same result as the non-piped code (which gives dX). – Jeremy K. Oct 09 '18 at 14:47
  • I edited my first answer to include group_by(countrycode). Does it work for you now? – Randall Helms Oct 10 '18 at 08:17
1

I believe this has the same reason as why

with(pdf, X - lag(X))

does not give the answer as expected (respecting panel structure) but:

[1] NA  0 -1  0  2 -1  4 -1  3

The evaluation with with() happens inside the first argument and by accessing a pdata.frame like this, the very internal structure of a pdata.frame is used where a column is not a pseries object but the bare bone type (e.g. numeric). By accessing a pdata.frame's column with the $ accessor, the column becomes a pseries in that moment and the correct lag method is used which can respect the panel structure.

My guess is, the eval construct in the pipe operator's (%>%) definition evaluates its first argument in the same fashion.

(This is a shortcoming of the current definition of the pdata.frame object).

Helix123
  • 3,502
  • 2
  • 16
  • 36