adding lagged variables (t+1), (t+2), (t-1) in panel data

Question

Edit: Trying to implement the solution provided in the answer below.

I'm providing new sample data as it fits very well my data.

> head(Grunfeld, 25)
   firm year    inv  value capital
1     1 1935  317.6 3078.5     2.8
2     1 1936  391.8 4661.7    52.6
3     1 1937  410.6 5387.1   156.9
4     1 1938  257.7 2792.2   209.2
5     1 1939  330.8 4313.2   203.4
6     1 1940  461.2 4643.9   207.2
7     1 1941  512.0 4551.2   255.2
8     1 1942  448.0 3244.1   303.7
9     1 1943  499.6 4053.7   264.1
10    1 1944  547.5 4379.3   201.6
11    1 1945  561.2 4840.9   265.0
12    1 1946  688.1 4900.9   402.2
13    1 1947  568.9 3526.5   761.5
14    1 1948  529.2 3254.7   922.4
15    1 1949  555.1 3700.2  1020.1
16    1 1950  642.9 3755.6  1099.0
17    1 1951  755.9 4833.0  1207.7 
18    1 1952  891.2 4924.9  1430.5
19    1 1953 1304.4 6241.7  1777.3
20    1 1954 1486.7 5593.6  2226.3
21    2 1935  209.9 1362.4    53.8
22    2 1936  355.3 1807.1    50.5
23    2 1937  469.9 2676.3   118.1
24    2 1938  262.3 1801.9   260.2
25    2 1939  230.4 1957.3   312.7



library(plm)
data("Grunfeld", package="plm")

Grunfeld$firm <- as.factor(Grunfeld$firm)

#adding lagged variable (+1)
Grunfeld$inv.plus1 <- NA
for (f in levels(Grunfeld$firm)) {
 Grunfeld[which(Grunfeld$firm == f),]$inv.plus1 <- c(Grunfeld[which(Grunfeld$firm == f),]$inv[-1],NA)
}

#adding lagged variable (+2)
Grunfeld$inv.plus2 <- NA
for (f in levels(Grunfeld$firm)) {
  Grunfeld[which(Grunfeld$firm == f),]$inv.plus2 <- c(Grunfeld[which(Grunfeld$firm == f),]$inv[-c(1,2)],NA)
}

#adding lagged variable (-1)
Grunfeld$inv.minus1 <- NA
for (f in levels(Grunfeld$firm)) {
  Grunfeld[which(Grunfeld$firm == f),]$inv.minus1 <- c(Grunfeld[which(Grunfeld$firm == f),]NA,$inv[-1],)
}

While it works for the (+1) variable I'm unable to derive the correct code for (+2) or (-1). What am i doing wrong?

I'm using the plm package and I would like to regress the following: lm(inv(t+1) ~ inv(t) + other variables(t)) as well as lm("inv(t+2)" ~ inv(t) + other variables(t)) and lm("inv(t+3)" ~ inv(t) + other variables(t))

Is there a convenient way in how to add lagged variables in both directions (i.e. inv(t+1), inv(t-1) for a horizon of up to 3 years? My data is in a balanced format, although there are quite many "NA". I don't know if it is still considered as a balanced panel. Is there any package or formula? Thank you in advance for your help.

Edit: I tried to do the same as in the answer provided below:

dd$earnings.plus1 <- c(dd$earnings[-1], NA)
dd$earnings.plus2 <- c(dd$earnings[-c(1:2)], NA, NA)

but instead i'm trying to define dd$earnings.minus1

z<- nrows(set)
dd$earnings.minus1 <- c(NA, dd$earnings[-z])

but it is not working properly as the last value from firm 1 is moved to firm 2. This doesn't seem to happen with the solution above. what's the difference here?

You want to "regress" a single value against another single value? That doesn't really make sense. — MrFlick, Jul 11 '14 at 19:19
sorry for being that unclear, but I have other variables next to earnings as independent variables. — Gritti, Jul 11 '14 at 19:23
But if you are only using a single observation as the response, you can't do any estimation (it's even more impossible when adding additional covariates). What is the vector of values you are using for the regression? — MrFlick, Jul 11 '14 at 19:26
The aim is to derive the coefficients of the model to generate earnings forecasts. — Gritti, Jul 11 '14 at 19:33
possible duplicate of [Adding lagged variables to an lm model?](http://stackoverflow.com/questions/13096787/adding-lagged-variables-to-an-lm-model) — Carl, Jul 11 '14 at 20:06
just use the `lag` function of the package `plm` you are using already. — Helix123, Dec 09 '18 at 22:57

Carl · Accepted Answer · 2014-07-14T14:02:53.887

0

One way to accomplish this would be to simply duplicate columns, offset by your desired ts, assuming that each row is a particular time (which it seems to be based on your question), and that the difference between rows is the same (and exactly the lag you want, again seems to be based on your question).

So given your data, something like

 dd$earnings.plus1 <- c(dd$earnings[-1], NA)
 dd$earnings.plus2 <- c(dd$earnings[-c(1:2)], NA, NA)
 # ...etc

and then trimming your lm data by the appropriate number of rows:

 lm(earnings.plus1 ~ earnings + year + firm, data=head(dd,-1))
 lm(earnings.plus2 ~ earnings + year + firm, data=head(dd,-2))
 # ...etc

One could obviously get more general in implementation (e.g., make use of embed), but for small, non-repeated analysis the copy-paste-adjust approach is probably good enough.

EDIT:

So, my bad: I'm not sure why that isn't happening for the plus shifts, but it should be. I ignored the fact that your data is sliced on a variety of other parameters - in reality, you've probably got a column like employee_id. Before making those lagged variables (or your alternative that you edited in), you need to subset your data so that you're only lagging the relevant items.

Here's what I did to do the subsetting just on firm:

dd$firm <- as.factor(dd$firm)
dd$earnings.plus1 <- NA
for (f in levels(dd$firm)) {
  dd[which(dd$firm == f),]$earnings.plus1 <- c(dd[which(dd$firm == f),]$earnings[-1],NA)
}

you can add in other slices for .plus2 etc.

edited Jul 14 '14 at 14:02

answered Jul 11 '14 at 19:44

Carl

7,538
1
40
64

thank you! is it necessary to trim the data? aren't rows that contain either only "NA" independent variables or NA as dependent variable deleted automatically/ignored by lm anyways? Or am i totally wrong here. Another nice feature would be if i could delete certain rows depending either on the id (e.g. delete an entire firm) or delete certain years for all firms (e.g. shorten the length of the data). Is Extract the function to work with? Best regards – Gritti Jul 12 '14 at 07:27
It is not strictly necessary to trim, but I think it makes it clearer what you are doing (rather than assuming the `lm` magic will do it for you). Relative to your question about firms, it is easy to filter by firm - you can use `subset`, e.g., or you if you have lots of data / operations and need to worry about performance, the `data.table` package can provide more efficient filtering. – Carl Jul 12 '14 at 14:37
thank you again for your help. i'm trying the same with lagged variable now, I tried to the opposite but it is not working properly. Here is what i did: z <- nrow(set) set$earnings.lag1 <- c(NA,earnings$ta[-z]) – Gritti Jul 14 '14 at 11:55
I edited the posted question to make it easier for readers to follow. How do respond to people? the field is somehow ignoring my @carl – Gritti Jul 14 '14 at 19:42
@Gritti - I'm not sure I understand what you're asking with "How do respond to people?" Do you mean, make sure people who have answered see that you've updated the question? I'm not sure SO offers that capability, but you always comment on an answer to alert that user. – Carl Jul 14 '14 at 21:41
I was going to say "How do I respond to people?" I was trying to set an "@Carl" at the beginning of the post and typed it in as it said on the help menu but it somehow didn't work. But thanks again for your great help. I was able to derive the lagged variables in both directions now. – Gritti Jul 15 '14 at 09:21
I'm trying to use your provided solution on a smaller subset where for example a firm only has 2 year observations and thus the code provides an error. Is there any way you can require a firm to have more observations than lags you want to introduce? For example firm 2 has only 2 observations left (in my subset) and thus lagged variables (+3 observations) are infeasible. – Gritti Jul 23 '14 at 16:43
@Gritti first you'd need to decide how you want those cases treated relative to what you're actually doing - e.g., you might want to fit the +3 lags only against firms that have sufficient data. I think it's preferable to subset your data (i.e., exclude firms with insufficient data) to accomplish that, but you might instead augment with `NA`s for firms that have insufficient data and then use the `na.action` with `lm`. Again, depends on what you're actually doing and if you want specific guidance, you should ask a new question. – Carl Jul 23 '14 at 17:44

score 0 · Answer 2 · answered Jul 15 '14 at 09:10

Thanks to carl I could derive the code for adding lagged variables in both directions in panel data.

library(plm)
data("Grunfeld", package="plm")

Grunfeld$firm <- as.factor(Grunfeld$firm)



#adding lagged variable (+1)
Grunfeld$inv.plus1 <- NA
for (f in levels(Grunfeld$firm)) {
 Grunfeld[which(Grunfeld$firm == f),]$inv.plus1 <- c(Grunfeld[which(Grunfeld$firm == f),]$inv[-1],NA)
}

#adding lagged variable (+2)
Grunfeld$inv.plus2 <- NA
for (f in levels(Grunfeld$firm)) {
 Grunfeld[which(Grunfeld$firm == f),]$inv.plus2 <- c(Grunfeld[which(Grunfeld$firm == f),]$inv[-c(1,2)],NA, NA)
}


#adding lagged variable (-1)
Grunfeld$inv.minus1 <- NA
for (f in levels(Grunfeld$firm)) {
 Grunfeld[which(Grunfeld$firm == f),]$inv.minus1 <- c(NA,Grunfeld[which(Grunfeld$firm == f),]$inv[-nrow(Grunfeld[which(Grunfeld$firm == f),])])
}

let me know if there is an easier way as it requires a lot of code for a rather simple taks. But who am i to judge :D

adding lagged variables (t+1), (t+2), (t-1) in panel data

2 Answers2