How do I simplify these codes into a for-loop and create a table to display the F-statistics and P-value of the features.
print(scipystats.f_oneway(df_data.loc[df_data["SaleCondition"] == 'Normal'].SalePrice,
df_data.loc[df_data["SaleCondition"] == 'Abnorml'].SalePrice,
df_data.loc[df_data["SaleCondition"] == 'Partial'].SalePrice,
df_data.loc[df_data["SaleCondition"] == 'AdjLand'].SalePrice,
df_data.loc[df_data["SaleCondition"] == 'Alloca'].SalePrice,
df_data.loc[df_data["SaleCondition"] == 'Family'].SalePrice))
>>>F_onewayResult(statistic=45.57842830969571, pvalue=7.988268404991176e-44)
print(scipystats.f_oneway(df_data.loc[df_data["Fence"] == 'MnPrv'].SalePrice,
df_data.loc[df_data["Fence"] == 'GdWo'].SalePrice,
df_data.loc[df_data["Fence"] == 'GdPrv'].SalePrice,
df_data.loc[df_data["Fence"] == 'MnWw'].SalePrice))
>>>
F_onewayResult(statistic=4.948158647146986, pvalue=0.002312645635631918)
How do I create a table and extract of the F-statistic and P-values as input for the respective columns? and sort ascending order on variable with highest F-statistic values 1st?
EDITED - Which results is more accurate?
Result from my method:
F-statistics P-value
ExterQual 443.334831 1.439551e-204
KitchenQual 407.806352 3.032213e-192
BsmtQual 392.913506 9.610615e-186
GarageFinish 250.962467 1.199117e-93
MasVnrType 111.672380 4.793331e-65
Foundation 100.253851 5.791895e-91
CentralAir 98.305344 1.809506e-22
HeatingQC 88.394462 2.667062e-67
Neighborhood 71.784865 1.558600e-225
GarageType 71.522123 1.247154e-66
BsmtExposure 70.887984 1.022671e-42
BsmtFinType1 67.602175 1.807731e-63
SaleCondition 45.578428 7.988268e-44
MSZoning 43.840282 8.817634e-35
PavedDrive 42.024179 1.803569e-18
LotShape 40.132852 6.447524e-25
Alley 35.562060 4.899826e-08
SaleType 28.863054 5.039767e-42
FireplaceQu 24.398929 5.016300e-19
Electrical 23.067673 1.663249e-18
HouseStyle 19.595001 3.376777e-25
Exterior1st 18.611743 2.586089e-43
RoofStyle 17.805497 3.653523e-17
Exterior2nd 17.500840 4.842186e-43
BsmtCond 14.030600 5.136901e-09
BldgType 13.011077 2.056736e-10
LandContour 12.850188 2.742217e-08
GarageQual 9.570389 1.240803e-07
GarageCond 9.541161 1.309714e-07
ExterCond 8.798714 5.106681e-07
LotConfig 7.809954 3.163167e-06
RoofMatl 6.727305 7.231445e-08
Condition1 6.118017 8.904549e-08
Fence 4.948159 2.312646e-03
Heating 4.259819 7.534721e-04
Functional 4.057875 4.841697e-04
BsmtFinType2 2.702450 1.941009e-02
Street 2.459290 1.170486e-01
MiscFeature 2.157324 1.047276e-01
Condition2 2.073899 4.342566e-02
LandSlope 1.958817 1.413964e-01
PoolQC 1.627469 3.039853e-01
Utilities 0.298804 5.847168e-01
MSSubClass NaN NaN
MoSold NaN NaN
YrSold NaN NaN
And results from @kitman0804 method:
def anova(data, x, y):
x_val = data[x].unique()
fstat = scipy.stats.f_oneway(*[df_data[y][data[x].isin([x_v])] for x_v in x_val])
tbl = pd.DataFrame({'F-statistics': [fstat.statistic], 'P-value': [fstat.pvalue]})
tbl.index = [x]
return tbl
f2_table = pd.concat([anova(categorical_data, x, 'SalePrice') for x in categorical_data.columns])
F-statistics P-value
ExterQual 443.334831 1.439551e-204
KitchenQual 407.806352 3.032213e-192
BsmtQual 316.148635 8.158548e-196
GarageFinish 213.867028 6.228747e-115
FireplaceQu 121.075121 2.971217e-107
Foundation 100.253851 5.791895e-91
CentralAir 98.305344 1.809506e-22
HeatingQC 88.394462 2.667062e-67
MasVnrType 84.672201 1.054025e-64
GarageType 80.379992 6.117026e-87
Neighborhood 71.784865 1.558600e-225
BsmtFinType1 64.688200 2.386358e-71
BsmtExposure 63.939761 7.557758e-50
SaleCondition 45.578428 7.988268e-44
MSZoning 43.840282 8.817634e-35
PavedDrive 42.024179 1.803569e-18
LotShape 40.132852 6.447524e-25
MSSubClass 33.732076 8.662166e-79
SaleType 28.863054 5.039767e-42
GarageQual 25.776093 5.388762e-25
GarageCond 25.750153 5.711746e-25
BsmtCond 19.708139 8.195794e-16
HouseStyle 19.595001 3.376777e-25
Exterior1st 18.611743 2.586089e-43
Electrical 18.460192 8.226925e-18
RoofStyle 17.805497 3.653523e-17
Exterior2nd 17.500840 4.842186e-43
Alley 15.176614 2.996380e-07
Fence 13.433276 9.379977e-11
BldgType 13.011077 2.056736e-10
LandContour 12.850188 2.742217e-08
PoolQC 10.509853 7.700989e-07
ExterCond 8.798714 5.106681e-07
LotConfig 7.809954 3.163167e-06
BsmtFinType2 7.565378 5.225649e-08
RoofMatl 6.727305 7.231445e-08
Condition1 6.118017 8.904549e-08
Heating 4.259819 7.534721e-04
Functional 4.057875 4.841697e-04
MiscFeature 2.593622 3.500367e-02
Street 2.459290 1.170486e-01
Condition2 2.073899 4.342566e-02
LandSlope 1.958817 1.413964e-01
MoSold 0.957865 4.833523e-01
YrSold 0.645525 6.300888e-01
Utilities 0.298804 5.847168e-01