3

I tried to emulate stepAIC function in R doing it "manually" but it takes forever (I posted just the first two tries). Is there something similar to stepAIC function (that eliminates one variable with highest p-value at iteration and minimize AIC) in python for logistic regression?

#create model with double interactions
datapol = data.drop(['flag'], axis=1) #elimino colonna flag dai dati
poly=sklearn.preprocessing.PolynomialFeatures(interaction_only=True,include_bias = False)

#calculate AIC for model with double interactions 
m_sat=poly.fit_transform(datapol)
m1=sm.Logit(np.asarray(flag.astype(int)),m_sat.astype(int))
m1.fit()
print(m1.fit().summary2())

#create new model without variable that has p-value>0.05
mx1=pd.DataFrame(m_sat)
mx2=np.asarray(mx1.drop(mx1.columns[[3]], axis=1))
m2=sm.Logit(np.asarray(flag.astype(int)),mx2.astype(int))
m2.fit()
print(m2.fit().summary2())

edit: I found an algorithm that emulate stepAIC using forward direction https://qiita.com/mytk0u0/items/aa2e3f5a66fe9e2895fa

Porridge
  • 107
  • 2
  • 13

2 Answers2

1

Check for a function called RFE from sklearn package.

# Running RFE with the output number of the variable equal to 9
lm = LinearRegression()
rfe = RFE(lm, 9)             # running RFE
rfe = rfe.fit(X_train, y_train)
print(rfe.support_)           # Printing the boolean results
print(rfe.ranking_)  
Rakesh SK
  • 47
  • 1
  • 3
  • 2
    I found this slightly different, as stepAIC returns the optimal number of predictors, while RFE user needs to know specify the number of specified fields. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html – frank Jul 03 '19 at 13:30
0

stepAIC is just finding the combinations of feature that reduce the AIC: the lower AIC the better. So I think if your have fixed number of features that you want, you can just explicitly compare the AIC using OLS

import statsmodels.api as sm
#you can explicitly change x, x can be changed with number of features
regressor_OLS = sm.OLS(Y, x).fit() 
regressor_OLS.summary()
regressor_OLS.aic #return AIC value
katie
  • 51
  • 5