-1

I am working on panel regression and decided to switch to lm(), because plm() does not have a good predict() function for test data (as well as linearmodels in Python) and lme4 syntax is not intuitive for me as a newbie to econometrics.

I want to use lm and predict on unseen data.

My lm() equation looks like this with author as a fixed effect

fit <- lm(y ~ a + b + factor(author), data = train)

Obviously it prints me thousands of individual coefficients. How to construct a model in lm(), evaluating for all authors without printing individually?

Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
  • P.S. Appreciate any great tutorials in R/Python on predicting panel model on test data and syntax. Obviously did a ton of research in advance. – Anakin Skywalker Mar 11 '22 at 19:49
  • It is obvious that your shown code does not print anything. Please clarify question. – jay.sf Mar 11 '22 at 19:58

1 Answers1

1

Let's assume some concrete data example, e.g.

a <- rnorm(100)
b <- runif(100)

train <- data.frame(a, b, 
                    author = sample(LETTERS[1:10], 100, 1),
                    y = 3*a + .5*b + rnorm(100))

Now we do a fixed effect regression, I assume we do not want any Intercept so the command is

fit <- lm(y ~ a + b + author - 1, data = train)

The - 1 part in the formula leaves the Intercpet out, instead a fixed effect is computed for each author. No base level left out.

Printing this model or it's summary is feasible with the 10 authors in the example but not with thousands in your work.

You can print the coefficents of only a and b like this

> fit$coefficients[c('a', 'b')]
         a          b 
3.02022335 0.09789947 

You can see the coefficients and their p-values via the anova command

> anova(fit)

Analysis of Variance Table

Response: y
          Df  Sum Sq Mean Sq   F value    Pr(>F)    
a          1 1139.90 1139.90 1034.2307 < 2.2e-16 ***
b          1    7.73    7.73    7.0127  0.009591 ** 
author    10   10.75    1.07    0.9751  0.470812    
Residuals 88   96.99    1.10 

And you can even deconstruct summary(fit) for the coefficents or to display the adjusted R²:

> summary(fit)$coefficients[c("a", "b"),]
    Estimate Std. Error    t value     Pr(>|t|)
a 3.02022335 0.09697699 31.1437123 2.679195e-49
b 0.09789947 0.35161039  0.2784317 7.813342e-01
>
> summary(fit)$adj.r.squared
[1] 0.9122033

For other values see help(summary.lm). Should you still want to see the coefficient of author F that is

> fit$coefficients["authorF"]
  authorF 
0.6174314 
Bernhard
  • 4,272
  • 1
  • 13
  • 23