I have used Statsmodels to generate a OLS linear regression model to predict a dependent variable based on about 10 independent variables. The independent variables are all categorical.
I am interested in looking closer at the significance of the coefficients for one of the independent variables. There are 4 categories, so 3 coefficients -- each of which are highly significant. I would also like to look at the significance of the trend across all 3 categories. From my (limited) understanding, this is often done using a Wald Test and comparing all of the coefficients to 0.
How exactly is this done using Statsmodels? I see there is a Wald Test method for the OLS function. It seems you have to pass in values for all of the coefficients when using this method.
My approach was the following...
First, here are all of the coefficients:
np.array(lm.params) = array([ 0.21538725, 0.05675108, 0.05020252, 0.08112228, 0.00074715,
0.03886747, 0.00981819, 0.19907263, 0.13962354, 0.0491201 ,
-0.00531318, 0.00242845, -0.0097336 , -0.00143791, -0.01939182,
-0.02676771, 0.01649944, 0.01240742, -0.00245309, 0.00757727,
0.00655152, -0.02895381, -0.02027537, 0.02621716, 0.00783884,
0.05065323, 0.04264466, -0.13068456, -0.15694931, -0.25518566,
-0.0308599 , -0.00558183, 0.02990139, 0.02433505, -0.01582824,
-0.00027538, 0.03170669, 0.01130944, 0.02631403])
I am only interested in params 2-4 (which are the 3 coefficients of interest).
coeffs = np.zeros_like(lm.params)
coeffs = coeffs[1:4] = [0.05675108, 0.05020252, 0.08112228]
Checking to make sure this worked:
array([ 0. , 0.05675108, 0.05020252, 0.08112228, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ])
Looks good, now to run in the test!
lm.wald_test(coeffs) =
<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[ 13.11493673]]), p=0.000304699208434, df_denom=1248, df_num=1>
Is this the correct approach? I could really use some help!