3

So, I'd like to run a regression on a panel data, using two-ways effects, for time and stores. If the panel is perfectly balanced, it works fine, but for some reason, if it's not, the code gets stuck. (see: https://stat.ethz.ch/pipermail/r-help/2010-May/239272.html).

My data in particular is not unbalanced in nature, but it has some NAs, so I guess it's becoming unbalanced when the plm function removes rows with NA. I wrote a sample code to exemplify the data I have.

If I run this:

set.seed(123)
library(plm)
number.of.days <- 1100
number.of.stores <- 1000
days <- sort(rep(c(1:number.of.days),number.of.stores))
stores <- rep(c(1:number.of.stores),number.of.days)

data <- cbind.data.frame(stores,days,matrix(rnorm(number.of.days*number.of.stores*7),nrow=number.of.days*number.of.stores,ncol=7))
colnames(data)[3:9] <- c('y',paste0('x',1:6))

data <- plm.data(data,c("stores","days"))  
fit <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = data, index=c("stores","days"), effect="twoway", model="within")

It works correctly, because the panel is balanced. However, if I create some NA values:

data$y[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x1[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x2[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x3[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x4[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x5[sample(1:number.of.days*number.of.stores,150)] <- NA
data$x6[sample(1:number.of.days*number.of.stores,150)] <- NA

And try to run the regression again:

 fit <- plm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = data, index=c("stores","days"), effect="twoway", model="within")

It does not work (the code apparently never stops running)

I tried using 'individual' effect for stores and adding a matrix with dummies for time, but since there are 1100 days, it becomes just as slow.

I assume this is not a rare problem. Is there any known solution?

Thank you

thiagogps
  • 439
  • 4
  • 12
  • Running your code I got a lack of memory error: `Reached total allocation of 8139Mb: see help(memory.size)`. However, if you reduce your sample size, the error vanishes. Is that the same error you are getting with your real data? – Rodrigo Remedio Dec 09 '16 at 09:24
  • If I run on Linux server, which is what I usually do, it just never ends. But when I try on local RStudio, yes, this happens. About the size, you are correct, but my real data is just as big – thiagogps Dec 09 '16 at 10:23
  • Please specifiy what you mean by "It does not work". Do you get some kindof error message or what is the behaviour observed that signals "does not work"? – Helix123 Dec 09 '16 at 14:02
  • What do you mean with "it does not work"? Is the model stopped by an error or the final output is not understandable? – A1010 Dec 07 '19 at 00:12
  • You can try to set `options("plm.fast" = TRUE)` from plm version 2.4 onwards. – Helix123 Mar 01 '21 at 18:11

1 Answers1

4

The felm function from the lfe package is able to handle this (and efficiently, too).

Running

fit2 <- felm(y ~ x1 + x2 + x3 + x4 + x5 + x6 | stores + days | 0 | stores , data = data)

on the data with the NAs yields a result.

Note the formula specification in which you specify which factors are to be projected out (i.e. the fixed effects). The last stores in the formula specifies the variable for clustering standard errors. For details see the excellent felm help file and lfe package documentation.

Felix
  • 1,611
  • 13
  • 22