plm() versus lm() with multiple fixed effects

Question

I am attempting to run a model with county, year, and state:year fixed effects. The lm() approach looks like this:

lm <- lm(data = mydata, formula = y ~ x + county + year + state:year

where county, year, and state:year are all factors.

Because I have a large number of counties, running the model is very slow using lm(). More frustrating given the number of models I need to produce, lm() produces a much larger object than plm(). This plm() command yields the same coefficients and levels of significance for my main variables.

plm <- plm(data = mydata, formula = y ~ x + year + state:year, index = "county", model = "within"

However, these produce substantially different R-squared, Adj. R-squared, etc. I thought I could solve the R-squared problem by calculating the R-squared for plm by hand:

SST <- sum((mydata$y - mean(mydata$y))^2)

fit <- (mydata$y - plm$residuals)

SSR <- sum((fit - mean(mydata$y))^2)

R2 <- SSR / SST

I tested the R-squared code with lm and got the same result reported by summary(lm). However, when I calculated R-squared for plm I got a different R-squared (and it was greater than 1).

At this point I checked what the coefficients for my fixed effects in plm were and they were different than the coefficients in lm.

Can someone please 1) help me understand why I'm getting these differing results and 2) suggest the most efficient way to construct the models I need and obtain correct R-squareds? Thanks!

plm() versus lm() with multiple fixed effects

0 Answers0