I am attempting to run a model with county, year, and state:year fixed effects. The lm() approach looks like this:
lm <- lm(data = mydata, formula = y ~ x + county + year + state:year
where county, year, and state:year are all factors.
Because I have a large number of counties, running the model is very slow using lm(). More frustrating given the number of models I need to produce, lm() produces a much larger object than plm(). This plm() command yields the same coefficients and levels of significance for my main variables.
plm <- plm(data = mydata, formula = y ~ x + year + state:year, index = "county", model = "within"
However, these produce substantially different R-squared, Adj. R-squared, etc. I thought I could solve the R-squared problem by calculating the R-squared for plm
by hand:
SST <- sum((mydata$y - mean(mydata$y))^2)
fit <- (mydata$y - plm$residuals)
SSR <- sum((fit - mean(mydata$y))^2)
R2 <- SSR / SST
I tested the R-squared code with lm
and got the same result reported by summary(lm)
. However, when I calculated R-squared for plm
I got a different R-squared (and it was greater than 1).
At this point I checked what the coefficients for my fixed effects in plm
were and they were different than the coefficients in lm
.
Can someone please 1) help me understand why I'm getting these differing results and 2) suggest the most efficient way to construct the models I need and obtain correct R-squareds? Thanks!