import pandas as pd
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import SelectPercentile
a = pd.read_csv('NCAA_2003-2016_with_diff.csv')
logreg = lm.LogisticRegression()
rfecv = RFECV(estimator=logreg, cv=10, scoring='?')
There are 914 rows * 191 columns, e.g:
x = df[['diff_dist','team1_log5','tpp','orp','tempo','efg','ftr','blk']]
y = df[['result']]
Which means there are other 'x' and I try to select most effective varaibles to predict result.
How to write a for loop to do this?