So, I'd like to run a regression on a panel data, using two-ways effects, for time and stores. If the panel is perfectly balanced, it works fine, but for some reason, if it's not, the code gets stuck. (see: https://stat.ethz.ch/pipermail/r-help/2010-May/239272.html).
My data in particular is not unbalanced in nature, but it has some NAs, so I guess it's becoming unbalanced when the plm function removes rows with NA. I wrote a sample code to exemplify the data I have.
If I run this:
set.seed(123)
library(plm)
number.of.days <- 1100
number.of.stores <- 1000
days <- sort(rep(c(1:number.of.days),number.of.stores))
stores <- rep(c(1:number.of.stores),number.of.days)
data <- cbind.data.frame(stores,days,matrix(rnorm(number.of.days*number.of.stores*7),nrow=number.of.days*number.of.stores,ncol=7))
colnames(data)[3:9] <- c('y',paste0('x',1:6))
data <- plm.data(data,c("stores","days"))
fit <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = data, index=c("stores","days"), effect="twoway", model="within")
It works correctly, because the panel is balanced. However, if I create some NA values:
data$y[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x1[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x2[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x3[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x4[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x5[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x6[sample(1:number.of.days*number.of.stores,150)] <- NA
And try to run the regression again:
fit <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = data, index=c("stores","days"), effect="twoway", model="within")
It does not work (the code apparently never stops running)
I tried using 'individual' effect for stores and adding a matrix with dummies for time, but since there are 1100 days, it becomes just as slow.
I assume this is not a rare problem. Is there any known solution?
Thank you