I am used to using Stata or R to do linear regression models but I am transitioning more workflow over to Python.
The useful thing about these two programs is that they intuitively know that you do not care about all of the entity- or time-fixed effects in a linear model, so when estimating panel models, they will drop multicollinear dummies from the model (reporting which ones they drop).
While I understand that estimating models in such a way is not ideal and one should be careful about regressions to run (etc), this is useful in practice, because it means that you can see results first, and worry about some of the nuances of the dummies later (especially since you don't care about dummies in a fully saturated Fixed-Effects model).
Let me provide an example. The following requires linearmodels
and loads a dataset and attempts to run a panel regression. It is a modified version of the example from their documentation.
# Load the data (requires statsmodels and linearmodels)
import statsmodels.api as sm
from linearmodels.datasets import wage_panel
import pandas as pd
data = wage_panel.load()
year = pd.Categorical(data.year)
data = data.set_index(['nr', 'year'])
data['year'] = year
print(wage_panel.DESCR)
print(data.head())
# Run the panel regression
from linearmodels.panel import PanelOLS
exog_vars = ['exper','union','married']
exog = sm.add_constant(data[exog_vars])
mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True)
fe_te_res = mod.fit()
print(fe_te_res)
This gives the following error:
AbsorbingEffectError: The model cannot be estimated. The included effects have fully absorbed one or more of the variables. This occurs when one or more of the dependent variable is perfectly explained using the effects included in the model.
However, if you estimate in Stata by exporting the same data to Stata, running:
data.drop(columns='year').to_stata('data.dta')
And then running the equivalent in your stata file (after loading in the data):
xtset nr year
xtreg lwage exper union married i.year, fe
This will do the following:
> . xtreg lwage exper union married i.year, fe
note: 1987.year omitted because of collinearity
Fixed-effects (within) regression Number of obs = 4360
Group variable: nr Number of groups = 545
R-sq: within = 0.1689 Obs per group: min = 8
between = 0.0000 avg = 8.0
overall = 0.0486 max = 8
F(9,3806) = 85.95
corr(u_i, Xb) = -0.1747 Prob > F = 0.0000
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | .0638624 .0032594 19.59 0.000 .0574721 .0702527
union | .0833697 .0194393 4.29 0.000 .0452572 .1214821
married | .0583372 .0183688 3.18 0.002 .0223235 .0943509
|
year |
1981 | .0496865 .0200714 2.48 0.013 .0103348 .0890382
1982 | .0399445 .019123 2.09 0.037 .0024521 .0774369
1983 | .0193513 .018662 1.04 0.300 -.0172373 .0559398
1984 | .0229574 .0186503 1.23 0.218 -.0136081 .0595229
1985 | .0081499 .0191359 0.43 0.670 -.0293677 .0456674
1986 | .0036329 .0200851 0.18 0.856 -.0357456 .0430115
1987 | 0 (omitted)
|
_cons | 1.169184 .0231221 50.57 0.000 1.123851 1.214517
-------------+----------------------------------------------------------------
sigma_u | .40761229
sigma_e | .35343397
rho | .57083029 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Notice that stata arbitrarily dropped 1987 from the regression, but still ran. Is there a way to get similar functionality in linearmodels
or statsmodels
?