I'm unit acceptance testing some code I wrote. It's conceivable that at some point in the real world we will have input data where the dependent variable is constant. Not the norm, but possible. A linear model should yield coefficients of 0 in this case (right?), which is fine and what we would want -- but for some reason I'm getting some wild results when I try to fit the model on this use case.
I have tried 3 models and get diffirent weird results every time -- or no results in some cases.
For this use case all of the dependent observations are set at 100, all the freq_weights are set at 1, and the independent variables are a binary coded dummy set of 20 features.
In total there are 150 observations.
Again, this data is unlikely in the real world but I need my code to be able to work on this ugly data. IDK why I'm getting such erroneous and different results.
As I understand with no variance in the dependent variable I should be getting 0 for all my coefficients.
freq = freq['Freq']
Indies = sm.add_constant(df)
model = sm.OLS(df1, Indies)
res = model.fit()
res.params
yields:
const 65.990203
x1 17.214836
reg = statsmodels.GLM(df1, Indies, freq_weights = freq)
results = reg.fit(method = 'lbfgs', max_start_irls=0)
results.params
yields:
const 83.205034
x1 82.575228
reg = statsmodels.GLM(df1, Indies, freq_weights = freq)
result2 = reg.fit()
result2.params
yields
PerfectSeparationError: Perfect separation detected, results not available