plm vs lm - different results?

Question

I tried several times to use lm and plm to do a regression. And I get different results.

First, I used lm as follows:

fixed.Region1 <- lm(CapNormChange ~ Policychanges + factor(Region), 
    data=Panel)

Further I used plm in the following way:

fixed.Region2 <- plm(CapNormChange ~ Policychanges+ factor(Region), 
    data=Panel, index=c("Region", "Year"), model="within", effect="individual")

I think there is something wrong with plm because I don't see an intercept in the results (see below). Furthermore, I am not entirely sure if + factor (Region) is necessary, however, if it is not there, I don't see the coefficients (and significance) for the dummy.

So, my question is:

I am using the plm function wrong? (or what is wrong about it)
If not, how can it be that the results are different?

If somebody could give me a hint, I would really appreciate.

Results from LM:

Call:
lm(formula = CapNormChange ~ Policychanges + factor(Region), 
    data = Panel)

Residuals:
    Min      1Q  Median      3Q     Max 
-31.141  -4.856  -0.642   1.262 192.803 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      17.3488     4.9134   3.531 0.000558 ***
Policychanges                     0.6412     0.1215   5.277 4.77e-07 ***
factor(Region)Asia              -19.3377     6.7804  -2.852 0.004989 ** 
factor(Region)C America + Carib   0.1147     6.8049   0.017 0.986578    
factor(Region)Eurasia           -17.6476     6.8294  -2.584 0.010767 *  
factor(Region)Europe            -20.7759     8.8993  -2.335 0.020959 *  
factor(Region)Middle East       -17.3348     6.8285  -2.539 0.012200 *  
factor(Region)N America         -17.5932     6.8064  -2.585 0.010745 *  
factor(Region)Oceania           -14.0440     6.8417  -2.053 0.041925 *  
factor(Region)S America         -14.3580     6.7781  -2.118 0.035878 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.72 on 143 degrees of freedom
Multiple R-squared:  0.3455,    Adjusted R-squared:  0.3043 
F-statistic: 8.386 on 9 and 143 DF,  p-value: 5.444e-10`

Results from PLM:

 Call:
plm(formula = CapNormChange ~ Policychanges, data = Panel, effect = "individual", 
    model = "within", index = c("Region", "Year"))

Balanced Panel: n = 9, T = 17, N = 153

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-31.14147  -4.85551  -0.64177   1.26236 192.80277 

Coefficients:
              Estimate Std. Error t-value  Pr(>|t|)    
Policychanges  0.64118    0.12150   5.277 4.769e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    66459
Residual Sum of Squares: 55627
R-Squared:      0.16299
Adj. R-Squared: 0.11031
F-statistic: 27.8465 on 1 and 143 DF, p-value: 4.7687e-07`

I don't understand your question. Why exactly do you expect a linear regression (aka pooled OLS) to produce the same results as those from a fixed effects panel regression? Are you strictly talking about the "intercept vs no intercept" difference? If that's the case, individual fixed effects model _does not_ have a single intercept, it has multiple, hence not reported as such. — acylam, Mar 01 '18 at 20:59
I assumed that since fixed effects ("individual") is the same as introducing dummy variables (to my understanding), the results should either include n-1 dummy-variables + intercept reported, or all dummy variables. In this case, the PLM-function (second case), reports n-1 dummy-variables, but no intercept. Hence, that is what I don't understand. Or may-be I don't understand something else. — Sarah8888, Mar 01 '18 at 21:04
With regard to the other question: the LM-regression includes the dummy-variables (Regions), and hence, to my understanding corresponds to a fixed effects models (that uses regions as effects). Hence, that is the reason why I don't understand the results. — Sarah8888, Mar 01 '18 at 21:06
@Dason... you're burdened with toooooo much knowledge on useless R packages. ;p Q move to "Looks OK", end of triage review. — ZF007, Mar 01 '18 at 21:36
ok, I found the issue, if that is of interest. I basically had a mistake in the index formulation. I I use — Sarah8888, Mar 01 '18 at 22:24
@ZF007 I wasn't commenting from the review queues and the question has been edited since I wrote my comment. I'm not entirely sure what you're trying to say either way but I'm guessing it's irrelevant after taking what I just mentioned into account. — Dason, Mar 01 '18 at 22:57
@Sarah8888.. you can post a selfanswer in which you state your piece of code you changed. Not the whole code again ;p After two days you can select your own answer as best answer. — ZF007, Mar 01 '18 at 23:03

Helix123 · Answer 1 · 2018-03-02T13:40:44.760

You would need to leave out + factor(Region) in your formula for the within model with plm to get what you want.

Within models do not have an intercept, but some software packages (esp. Stata and Gretl) report one. You can estimate it with plm by running within_intercept on you estimated model. The help page has the details about this somewhat artificial intercept.

If you want the individual effects and their significance, use summary(fixef(<your_plm_model>)). Use pFtest to check if the within specification seems worthwhile.

The R squareds diverge between the lm model and the plm model. This is due to the lm model (if used like this with the dummies, it is usually called the LSDV model (least squares dummy variables)) gives what is sometimes called the overall R squared while plm will give you the R squared of the demeaned regression, sometimes called the within R squared. Stata's documentation has some details about this: https://www.stata.com/manuals/xtxtreg.pdf

Yes and No ! I did that -> see my updated question. However, the R squared and adjusted are still different ! Do you know why? — Sarah8888, Mar 02 '18 at 08:36
Also, thanks a lot for the summary(fixef. code ! very helpful — Sarah8888, Mar 02 '18 at 08:37

plm vs lm - different results?

1 Answers1

Linked