7

So I am running a fixed effects model using the plm package in R, and I am wondering how I can compare which of two models are more suitable.

For example, here is the code for two models I have constructed:

library(plm)

eurofix <- plm(rlogmod ~ db+gdp+logvix+gb+i+logtdo+fx+ld+euro+core, 
               data=euro, 
               model="within")

eurofix2 <- plm(rlogmod ~ db+gdp+logvix+gb+i+logtdo+ld+euro+core, 
                data=euro,
                model="within")

I know that with a regular lm call, I can compare two models by running an anova test, but that does not seem to work in this case. I always get the following error:

Error in UseMethod("anova") : 
  no applicable method for 'anova' applied to an object of class "c('plm', 'panelmodel')"

Does anybody know what to do with the plm package? Is the Wald Test appropriate?

Adam Smith
  • 2,584
  • 2
  • 20
  • 34
  • 1
    I settled with just using the Wald Test. That works fine with the plm package as it turns out. – NuclearPenguins Feb 05 '15 at 02:38
  • 1
    Thanks, I was about to ask this same question until I saw this. Can you share the code to implement the Wald Test to compare the two models? I'm having trouble with it. – daanoo Apr 12 '15 at 15:43
  • 2
    Yes, the ANOVA in linear regression models is equivalent to the Wald test. See this [link](https://stats.stackexchange.com/questions/131401/how-to-get-anova-table-with-robust-standard-errors). – Juan Antonio Roldán Díaz Jun 19 '18 at 12:47
  • The euro data in the datasets package (available by default) does not have a structure that supports this question. – IRTFM Jun 23 '18 at 21:12
  • @42, there is a question on Cross Validated asking more or less the same with a working example, including data and all, her https://stats.stackexchange.com/questions/351746/comparing-groups-in-repeated-measures-fe-models-with-a-nested-error-component – Eric Fail Jun 24 '18 at 10:20

3 Answers3

5

The following code answered a similar question in Cross Validated The question there is also about test (joint) hypothesis in plm routine. It should be straightforward to apply the codes to your question.

library(plm)  # Use plm
library(car)  # Use F-test in command linearHypothesis
library(tidyverse)
data(egsingle, package = 'mlmRev')
dta <- egsingle %>% mutate(Female = recode(female, .default = 0L, `Female` = 1L))
plm1 <- plm(math ~ Female * (year), data = dta, index = c('childid', 'year', 'schoolid'), model = 'within')

# Output from `summary(plm1)` --- I deleted a few lines to save space.
# Coefficients:
#                 Estimate Std. Error t-value Pr(>|t|)    
# year-1.5          0.8842     0.1008    8.77   <2e-16 ***
# year-0.5          1.8821     0.1007   18.70   <2e-16 ***
# year0.5           2.5626     0.1011   25.36   <2e-16 ***
# year1.5           3.1680     0.1016   31.18   <2e-16 ***
# year2.5           3.9841     0.1022   38.98   <2e-16 ***
# Female:year-1.5  -0.0918     0.1248   -0.74     0.46    
# Female:year-0.5  -0.0773     0.1246   -0.62     0.53    
# Female:year0.5   -0.0517     0.1255   -0.41     0.68    
# Female:year1.5   -0.1265     0.1265   -1.00     0.32    
# Female:year2.5   -0.1465     0.1275   -1.15     0.25    
# ---

xnames <- names(coef(plm1)) # a vector of all independent variables' names in 'plm1'
# Use 'grepl' to construct a vector of logic value that is TRUE if the variable
# name starts with 'Female:' at the beginning. This is generic, to pick up
# every variable that starts with 'year' at the beginning, just write
# 'grepl('^year+', xnames)'.
picked <- grepl('^Female:+', xnames)
linearHypothesis(plm1, xnames[picked])

# Hypothesis:
# Female:year - 1.5 = 0
# Female:year - 0.5 = 0
# Female:year0.5 = 0
# Female:year1.5 = 0
# Female:year2.5 = 0
# 
# Model 1: restricted model
# Model 2: math ~ Female * (year)
# 
#   Res.Df Df Chisq Pr(>Chisq)
# 1   5504                    
# 2   5499  5  6.15       0.29
semibruin
  • 963
  • 1
  • 9
  • 18
  • Do you think it is possible to rewrite the `-c(1:5)` block in some way that make the code more generic? (also here) – Eric Fail Jun 20 '18 at 04:25
  • 1
    @EricFail Good suggestion. I replaced the counting (`-c(1:5)`) with regular expression matching. – semibruin Jun 20 '18 at 04:45
1

I struggled with this too, but finally came up with the following solution (with the help of a PhD friend). Using your example, see a sample solution below.

Use the AIC Criteria to compare panel models as follows:

library(plm)

eurofix <- plm(rlogmod ~ db+gdp+logvix+gb+i+logtdo+fx+ld+euro+core, 
               data=euro, 
               model="within")

eurofix2 <- plm(rlogmod ~ db+gdp+logvix+gb+i+logtdo+ld+euro+core, 
                data=euro,
                model="within")

# AIC = log(RSS/N) + 2K/N  for linear models
# AIC = log(RSS/n) + 2K/n  for panel models

Sum1 <- summary(eurofix)
RSS1 <- sum(Sum1$residuals^2)
K1 <- max(eurofix$assign)
N1 <- length(eurofix$residuals)
n1 <- N1 - K1 - eurofix$df.residual

AIC_eurofix = log(RSS1/n1) + (2*K1)/n1

Sum2 <- summary(eurofix2)
RSS2 <- sum(Sum2$residuals^2)
K2 <- max(eurofix2$assign)
N2 <- length(eurofix2$residuals)
n2 <- N2 - K2 - eurofix2$df.residual

AIC_eurofix2 = log(RSS2/n2) + (2*K2)/n2

The lower AIC value is the preferred model!

Glen Wade
  • 11
  • 1
0

Did you use the plm function anova()? I can't tell based on how you wrote your question. If you didn't, give it a shot.

If your question is more of a statistical one about selecting an approach that helps you adjudicate between the models rather than a technical one, then an answer would really depend on the way you're defining 'suitable.' If the only difference in the two models is the inclusion of fx in the first of the two, several statistical tests could assess the extent to which your model minimizes the squared error (e.g., R^2) or falls short due to the non-random distribution of residuals (e.g., VIF).

If you want to know whether or not including fx yields a model that fits your data that's somewhat resistant to overfitting, consider using the BIC. I typically favor BIC because it penalizes additional parameters more aggressively than other model fit statistics like AIC. The model with the lowest BIC tends to be the best fit model (though you should also use the Wald test/F-test confirmatorily IMO, especially as your nested models are their ideal use case). You should be able to obtain BIC values on model objects using plm as shown:

anova(model1, model2)

If that doesn't work, I've found the lme4 package function to be useful:

BIC(model1, model2)

Let me know if I'm misinterpreting the question - and let us know what you find out!

J.Q
  • 971
  • 1
  • 14
  • 29