sklearn.feature_selection and RFECV

Asked Mar 10 '17 at 21:47

Active Mar 11 '17 at 08:30

Viewed 486 times

import pandas as pd
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import SelectPercentile

a = pd.read_csv('NCAA_2003-2016_with_diff.csv')

logreg = lm.LogisticRegression()

rfecv = RFECV(estimator=logreg, cv=10, scoring='?')

There are 914 rows * 191 columns, e.g:

x = df[['diff_dist','team1_log5','tpp','orp','tempo','efg','ftr','blk']]
y = df[['result']]

Which means there are other 'x' and I try to select most effective varaibles to predict result.

How to write a for loop to do this?

edited Mar 11 '17 at 08:30

Vivek Kumar

35,217
8
109
132

asked Mar 10 '17 at 21:47

Hong

1

To clarify, is 'x' your features and you want to know how to do feature selection? Where does 'df' come from? – Cecilia Mar 10 '17 at 22:12
You need to describe more about the data and what you want to do? – Vivek Kumar Mar 11 '17 at 02:35
x are features, y is the response variable, I wanna select several features among over 100 features in my data set based on the regression model, the measurement could be 'mean squared error' or 'f- score', do I clarify now? – Hong Mar 11 '17 at 22:47

sklearn.feature_selection and RFECV

0 Answers0