How to statistically compare the intercept and slope of two different linear regression models in Python?

Question

I have two series of data as below. I want to create an OLS linear regression model for df1 and another OLS linear regression model for df2. And then statistically test if the y-intercepts of these two linear regression models are statistically different (p<0.05), and also test if the slopes of these two linear regression models are statistically different (p<0.05). I did the following

import numpy as np
import math
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm

np.inf == float('inf')
data1 = [1, 3, 45, 0, 25, 13, 43]
data2 = [1, 1, 1, 1, 1, 1, 1]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

fig, ax = plt.subplots()
df1.plot(figsize=(20, 10), linewidth=5, fontsize=18, ax=ax, kind='line')
df2.plot(figsize=(20, 10), linewidth=5, fontsize=18, ax=ax, kind='line')
plt.show()

model1 = sm.OLS(df1, df1.index)
model2 = sm.OLS(df2, df2.index)

results1 = model1.fit()
results2 = model2.fit()

print(results1.summary())
print(results2.summary())

Results #1

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      0   R-squared (uncentered):                   0.625
Model:                            OLS   Adj. R-squared (uncentered):              0.563
Method:                 Least Squares   F-statistic:                              10.02
Date:                Mon, 01 Mar 2021   Prob (F-statistic):                      0.0194
Time:                        20:34:34   Log-Likelihood:                         -29.262
No. Observations:                   7   AIC:                                      60.52
Df Residuals:                       6   BIC:                                      60.47
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             5.6703      1.791      3.165      0.019       1.287      10.054
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   2.956
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.769
Skew:                           0.811   Prob(JB):                        0.681
Kurtosis:                       2.943   Cond. No.                         1.00
==============================================================================

Results #2

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      0   R-squared (uncentered):                   0.692
Model:                            OLS   Adj. R-squared (uncentered):              0.641
Method:                 Least Squares   F-statistic:                              13.50
Date:                Mon, 01 Mar 2021   Prob (F-statistic):                      0.0104
Time:                        20:39:14   Log-Likelihood:                         -5.8073
No. Observations:                   7   AIC:                                      13.61
Df Residuals:                       6   BIC:                                      13.56
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.2308      0.063      3.674      0.010       0.077       0.384
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.148
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.456
Skew:                           0.000   Prob(JB):                        0.796
Kurtosis:                       1.750   Cond. No.                         1.00
==============================================================================

This is as far I have got, but I think something is wrong. Neither of these regression outcome seems to show the y-intercept. Also, I expect the coef in results #2 to be 0 since I expect the slope to be 0 when all the values are 1, but the result shows 0.2308. Any suggestions or guiding material will be greatly appreciated.

score 3 · Accepted Answer · answered Jun 29 '21 at 15:56

In statsmodels an OLS model does not fit an intercept by default (see the docs).

exog array_like A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant.

The documentation on the exog argument of the OLS constructor suggests using this feature of the tools module in order to add an intercept to the data.

To perform a hypothesis test on the values of the coefficients this question provides some guidance. This unfortunately only works if the variances of the residual errors is the same.

We can start by looking at whether the residuals of each distribution have the same variance (using Levine's test) and ignore coefficients of the regression model for now.


    import numpy as np
    import pandas as pd
    from scipy.stats import levene
    from statsmodels.tools import add_constant
    from statsmodels.formula.api import ols ## use formula api to make the tests easier
    
    np.inf == float('inf')
    
    data1 = [1, 3, 45, 0, 25, 13, 43]
    data2 = [1, 1, 1, 1, 1, 1, 1]
    
    df1 = add_constant(pd.DataFrame(data1)) ## add a constant  column so we fit an intercept
    df1 = df1.reset_index() ## just doing this to make the index a column of the data frame
    df1 = df1.rename(columns={'index':'x', 0:'y'})  ## the old index will now be called x and the old values are now y
    
    df2 = add_constant(pd.DataFrame(data2)) ## this does nothing because the y column is already a constant
    df2 = df2.reset_index()
    df2 = df2.rename(columns={'index':'x', 0:'y'})  ## the old index will now be called x and the old values are now y
    
    formula1 = 'y ~ x + const' ## define formulae
    formula2 = 'y ~ x'
    
    model1 = ols(formula1, df1).fit()
    model2 = ols(formula2, df2).fit()
    
    print(levene(model1.resid, model2.resid))

The output of the levene test looks like this:

LeveneResult(statistic=7.317386741297884, pvalue=0.019129208414097015)

So we can reject the null hypothesis that the residual distributions have the same variance at alpha=0.05.

There is no point to testing the linear regression coefficients now because the residuals don't have don't have the same distributions. It is important to remember that in a regression problem it doesn't make sense to compare the regression coefficients independent of the data they are fit on. The distribution of the regression coefficients depends on the distribution of the data.

Lets see what happens when we try the proposed test anyways. Combining the instructions above with this method from the OLS package yields the following code:


    ## stack the data and addd the indicator variable as described in:
    ## stackexchange question:
    df1['c'] = 1 ##  add indicator variable that tags the first groups of points
    
    df_all = df1.append(df2, ignore_index=True).drop('const', axis=1)
    df_all = df_all.rename(columns={'index':'x', 0:'y'})  ## the old index will now be called x and the old values are now y
    df_all = df_all.fillna(0) ## a bunch of the values are missing in the indicator columns after stacking
    df_all['int'] = df_all['x'] * df_all['c'] # construct the interaction column
    
    print(df_all) ## look a the data
    
    formula = 'y ~ x + c + int' ## define the linear model using the formula api
    result = ols(formula, df_all).fit() 
    hypotheses = '(c = 0), (int = 0)'
    
    f_test = result.f_test(hypotheses)
    print(f_test)

The result of the f-test looks like this:

<F test: F=array([[4.01995453]]), p=0.05233934453138028, df_denom=10, df_num=2>

The result of the f-test means that we just barely fail to reject any of the null hypotheses specified in the hypotheses variable namely that the coefficient of the indicator variable 'c' and interaction term 'int' are zero.

From this example it is clear that the f test on the regression coefficients is not very powerful if the residuals do not have the same variance.

Note that the given example has so few points it is hard for the statistical tests to clearly distinguish the two cases even though to the human eye they are very different. This is because even though the statistical tests are designed to make few assumptions about the data but those assumption get better the more data you have. When testing statistical methods to see if they accord with your expectations it is often best to start by constructing large samples with little noise and then see how well the methods work as your data sets get smaller and noisier.

For the sake of completeness I will construct an example where the Levene test will fail to distinguish the two regression models but f test will succeed to do so. The idea is to compare the regression of a noisy data set with its reverse. The distribution of residual errors will be the same but the relationship between the variables will be very different. Note that this would not work reversing the noisy dataset given in the previous example because the data is so noisy the f test cannot distinguish between the positive and negative slope.

import numpy as np
import pandas as pd
from scipy.stats import levene
from statsmodels.tools import add_constant
from statsmodels.formula.api import ols ## use formula api to make the tests easier

n_samples = 6

noise = np.random.randn(n_samples) * 5

data1 = np.linspace(0, 30, n_samples) + noise
data2 = data1[::-1] ## reverse the time series

df1 = add_constant(pd.DataFrame(data1)) ## add a constant  column so we fit an intercept
df1 = df1.reset_index() ## just doing this to make the index a column of the data frame
df1 = df1.rename(columns={'index':'x', 0:'y'})  ## the old index will now be called x and the old values are now y

df2 = add_constant(pd.DataFrame(data2)) ## this does nothing because the y column is already a constant
df2 = df2.reset_index()
df2 = df2.rename(columns={'index':'x', 0:'y'})  ## the old index will now be called x and the old values are now y

formula1 = 'y ~ x + const' ## define formulae
formula2 = 'y ~ x'

model1 = ols(formula1, df1).fit()
model2 = ols(formula2, df2).fit()

print(levene(model1.resid, model2.resid))

## stack the data and addd the indicator variable as described in:
## stackexchange question:
df1['c'] = 1 ##  add indicator variable that tags the first groups of points

df_all = df1.append(df2, ignore_index=True).drop('const', axis=1)
df_all = df_all.rename(columns={'index':'x', 0:'y'})  ## the old index will now be called x and the old values are now y
df_all = df_all.fillna(0) ## a bunch of the values are missing in the indicator columns after stacking
df_all['int'] = df_all['x'] * df_all['c'] # construct the interaction column

print(df_all) ## look a the data

formula = 'y ~ x + c + int' ## define the linear model using the formula api
result = ols(formula, df_all).fit() 
hypotheses = '(c = 0), (int = 0)'

f_test = result.f_test(hypotheses)
print(f_test)

The result of Levene test and the f test follow:

LeveneResult(statistic=5.451203655948632e-31, pvalue=1.0)
<F test: F=array([[10.62788052]]), p=0.005591319998324387, df_denom=8, df_num=2>

A final note since we are doing multiple comparisons on this data and stopping if we get a significant result, i.e. if the Levene test rejects the null we quit, if it doesn't then we do the f test, this is a stepwise hypothesis test and we are actually inflating our false positive error rate. We should correct our p-values for multiple comparisons before we report our results. Note that the f test is already doing this for the hypotheses we test about the regression coefficients. I am a bit fuzzy on the underlying assumptions of these testing procedures so I am not 100% sure that you are better off making the following correction but keep it in mind in case you feel you are getting false positives too often.

from statsmodels.sandbox.stats.multicomp import multipletests

print(multipletests([1, .005591], .05)) ## correct out pvalues given that we did two comparisons

The output looks like this:

(array([False,  True]), array([1.        , 0.01115074]), 0.025320565519103666, 0.025)

This means we rejected the second null hypothesis under the correction and that the corrected p-values looks like [1., 0.011150]. The last two values are corrections to your significance level under two different correction methods.

I hope this helps anyone trying to do this type of work. If anyone has anything to add I would welcome comments. This isn't my area of expertise so I could be making some mistakes.

We can use a heteroscedasticity robust covariance, e.g. `fit(cov_type="HC3")` to get correct standard errors and inference if the variance is not equal and constant across observations. — Josef, Jun 29 '21 at 16:14
@Josef I tried that out on the fit where I test the hypothesis that the intercepts and slopes are not equal and it actually fails a little worse at rejecting the null hypothesis (p=0.072). Is doing Levene's test beforehand too conservative? Or is it possible my code is in error in some way? I see in [this post](https://stackoverflow.com/questions/30553838/getting-statsmodels-to-use-heteroskedasticity-corrected-standard-errors-in-coeff) there is some note of assumptions not being tested for these covariance types. Could there be some underlying assumption that is not satisfied by the data? — Adam Jones, Jul 02 '21 at 16:38
"HC" does not need extra assumption on the structure of the data because each observation is treated separately. Only correlation robust types require information or structure across samples. — Josef, Jul 02 '21 at 16:56
It seems to me setting this HC3 covariance type is making the hypothesis test less willing to reject the null hypotheses. I am also seeing some warnings I didn't see before: ValueWarning: covariance of constraints does not have full rank. The number of constraints is 3, but rank is 2 and UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=14. It seems perhaps there is some condition of asymptotic convergence that is not satisfied for this little data. I am wondering if maybe the approach I am using is better for small data sets. — Adam Jones, Jul 02 '21 at 19:28
Are the 7 observations all you have? Most robust sandwich covariance matrices require larger number of observations to be reliable than simple standard errors because they impose fewer restrictions on second or higher moments. You could also try HC1, but with only 7 observations, inference will not have large power. — Josef, Jul 03 '21 at 18:07
This is the example given by the question asker. I am doing some related work where I want to detect changes in the distribution as early as possible while avoiding false positive errors as much as possible. I am finding the method I proposed has mean time to detect between 4 and 5 observations. depending on the size of the baseline and the significance level. — Adam Jones, Jul 05 '21 at 17:55

How to statistically compare the intercept and slope of two different linear regression models in Python?

1 Answers1