I am trying to run a regression where only some of the coefficients can be identified:
data = np.array([[2, 1, 1, 1], [1, 1, 1, 0]])
df = pd.DataFrame(data, columns=['y', 'x1', 'x2', 'x3'])
z = df.pop('y')
mod = sm.OLS(z, sm.add_constant(df))
Now, I have two outcomes, and the only variables that changes between the two observations is x3
. So, I would expect that (since I added a constant), the model would be unable to identify x1
or x2
, and would omit those. It should however give me a 1
for x3
, since the presence of that effect increases y
by one.
Stata does exactly give me this outcome, and it reminds me that it cannot estimate a standard error on the coefficient for x3
. statsmodels
, on the other hand...
res = mod.fit()
res.summary()
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Sun, 30 Aug 2020 Prob (F-statistic): nan
Time: 14:28:28 Log-Likelihood: 66.947
No. Observations: 2 AIC: -129.9
Df Residuals: 0 BIC: -132.5
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.5000 inf 0 nan nan nan
x2 0.5000 inf 0 nan nan nan
x3 1.0000 inf 0 nan nan nan
==============================================================================
Omnibus: nan Durbin-Watson: 0.200
Prob(Omnibus): nan Jarque-Bera (JB): 0.333
Skew: 0.000 Prob(JB): 0.846
Kurtosis: 1.000 Cond. No. 3.23
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
"""
What is happening here? And how can I get my expected output?