1

I have an unbalanced panel data set. The following data will do for illustration:

Id <- c(rep(1:4,3),rep(5,2))
Id <- Id[order(Id)]
Year <- c(rep(2000:2002,4),c(2000,2002))

z1 <- rnorm(14)
z2 <- rnorm(14)
z3 <- rnorm(14)
z4 <- rnorm(14)

CORR <- rbind(c(1,0.6,0.5,0.2),c(0.6,1,0.7,0.3),c(0.5,0.7,1,0.4),c(0.2,0.3,0.4,1))
CholCORR <- chol(CORR)
DataTest <- as.data.frame(cbind(z1,z2,z3,z4)%*%CholCORR)
names(DataTest)<-c("y","x1","x2","x3")
DataTest <- cbind(Id, Year, DataTest)
DataTest

   Id Year          y         x1         x2         x3
1   1 2000 -0.7463355 -1.1920928 -1.2358912 -0.2527170
2   1 2001 -0.3475260 -0.1729497 -0.6252036  0.4366446
3   1 2002  0.6815678  0.1775869  0.2860103  0.8479373
4   2 2000  0.7927199  1.2830142  1.7018747  2.4475020
5   2 2001  0.5157535  0.4365100  1.1512340  1.0882581
6   2 2002 -1.2806121  0.3392759  0.5781835  0.2829615
7   3 2000  1.8741119  0.3904028  0.7984584 -1.7015025
8   3 2001 -0.4799960  0.6397883  0.4719150  0.8601328
9   3 2002 -2.2031991  0.3789198 -0.1143526  0.6771387
10  4 2000 -1.0634857 -0.6171644 -0.5732400  0.7718195
11  4 2001  0.3266059 -0.6882776 -0.2013544 -0.7242561
12  4 2002  0.2921222 -0.5149802  0.6180026 -0.4417939
13  5 2000 -2.2447755 -1.3328675 -1.6370130  0.4537451
14  5 2002 -0.9959034 -0.5958327 -0.3408927  0.2162799

I want to do one cross section linear regression for each year (3 regressions) and save the estimated coefficients, but it's complicated by the fact that the panel is unbalanced (Id 5 is missing an observation for Year 2001) and because I want to estimate the following dynamic formula with lag and diff:

formula(diff(y) ~ lag(x1) + x2 + x3)

I have thought of using the plm, reshape, or plyr package, but I can't find an efficient to do it when I want to use lag and diff with my unbalanced panel.

Thanks,

M

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Mace
  • 1,259
  • 4
  • 16
  • 35
  • Why don't you use `split` that will cut your `data.frame` by `year` and use `lapply` on each piece afterwards ? – statquant Apr 25 '13 at 16:51
  • @statquant: That might be a solution. Could you give an example? I have another dirty solution where I use the `plm` package to create a `pdata.frame` and then use `diff` and `lag` to correctly lag and difference my panel. I then merge the results with `Id`, `Year`, `x2`, and `x3`, and then use `lmlist` in the `lme4` package to do linear regressions by `Year`. The first step of lagging/differencing and merging with the other data doesn't seem that efficient, though. – Mace Apr 25 '13 at 21:58
  • @AnandaMahto: I haven't found the answer. Not sure how to update the question. It has code to produce data, and it states the regression formula I want to perform in the cross section, for the `Id`'s available at any given year. Please let me know precisely what you think I should state for it to be easier to answer. – Mace Aug 14 '13 at 14:20
  • Now I found an answer :) – Mace Aug 14 '13 at 14:57

1 Answers1

2

I found a solution that is pretty efficient. It uses split and lapply as suggested by statquant above, but also uses the plm package in order to compute the difference of an unbalanced panel data set.

library(plm)

Id <- c(rep(1:4,3),rep(5,2))
Id <- Id[order(Id)]
Year <- c(rep(2000:2002,4),c(2000,2002))

z1 <- rnorm(14)
z2 <- rnorm(14)
z3 <- rnorm(14)
z4 <- rnorm(14)

CORR <- rbind(c(1,0.6,0.5,0.2),c(0.6,1,0.7,0.3),c(0.5,0.7,1,0.4),c(0.2,0.3,0.4,1))
CholCORR <- chol(CORR)
DataTest <- as.data.frame(cbind(z1,z2,z3,z4)%*%CholCORR)
names(DataTest)<-c("y","x1","x2","x3")
DataTest <- cbind(Id, Year, DataTest)
DataTest

       Id Year          y         x1         x2          x3   
1-2000  1 2000 -0.3837477  0.3065426  1.0646871  0.23757223   
1-2001  1 2001  1.2804333 -0.2015468 -0.2769726 -0.63032551  
1-2002  1 2002 -0.3242049 -1.3518821 -0.4720256  0.08556161 
2-2000  2 2000 -0.3298273 -0.4354473  0.3588493  0.80121465        
2-2001  2 2001 -0.5556866  1.1987959  1.6196555  1.28593473 
2-2002  2 2002  0.2861269  0.2921481  1.1051309  1.66204274  
3-2000  3 2000  0.9224208  0.4255198  0.8947040  1.11784735         
3-2001  3 2001 -1.1052755 -1.7078627 -1.9503432 -2.82343057 
3-2002  3 2002 -2.3020849 -0.8078460 -0.2692165  0.64940791 
4-2000  4 2000  2.5565427  1.7034472  2.2688046  1.71329610         
4-2001  4 2001  0.7015629  0.8518173 -0.2490498  0.70293713 
4-2002  4 2002  0.9454791  0.8830710  0.7355299  0.51836849  
5-2000  5 2000  0.4867604  0.3301825  0.7929939  0.55197991        
5-2002  5 2002 -0.5859263 -0.1164737  0.5831942 -0.03307241 

pDataTest <- pdata.frame(DataTest, index = c('Id','Year'))


pDataTest$yDiff <- diff(pDataTest$y)
pDataTest

       Id Year          y         x1         x2          x3      yDiff
1-2000  1 2000 -0.3837477  0.3065426  1.0646871  0.23757223         NA
1-2001  1 2001  1.2804333 -0.2015468 -0.2769726 -0.63032551  1.6641810
1-2002  1 2002 -0.3242049 -1.3518821 -0.4720256  0.08556161 -1.6046382
2-2000  2 2000 -0.3298273 -0.4354473  0.3588493  0.80121465         NA
2-2001  2 2001 -0.5556866  1.1987959  1.6196555  1.28593473 -0.2258592
2-2002  2 2002  0.2861269  0.2921481  1.1051309  1.66204274  0.8418135
3-2000  3 2000  0.9224208  0.4255198  0.8947040  1.11784735         NA
3-2001  3 2001 -1.1052755 -1.7078627 -1.9503432 -2.82343057 -2.0276963
3-2002  3 2002 -2.3020849 -0.8078460 -0.2692165  0.64940791 -1.1968093
4-2000  4 2000  2.5565427  1.7034472  2.2688046  1.71329610         NA
4-2001  4 2001  0.7015629  0.8518173 -0.2490498  0.70293713 -1.8549798
4-2002  4 2002  0.9454791  0.8830710  0.7355299  0.51836849  0.2439162
5-2000  5 2000  0.4867604  0.3301825  0.7929939  0.55197991         NA
5-2002  5 2002 -0.5859263 -0.1164737  0.5831942 -0.03307241         NA

Temp <-split(pDataTest,'Year')

MyFormula <- formula(yDiff ~ lag(x1) + x2 + x3)
Fit <- lapply(Temp[-1], lm, formula=MyFormula)
Fit

$`2001`

Call:
FUN(formula = ..1, data = X[[1L]])

Coefficients:
(Intercept)      lag(x1)           x2           x3  
      15.27       -52.88        -1.53        39.17  


$`2002`

Call:
FUN(formula = ..1, data = X[[2L]])

Coefficients:
(Intercept)      lag(x1)           x2           x3  
    -0.8738       0.1038       1.2805       0.1626  
Mace
  • 1,259
  • 4
  • 16
  • 35