My data looks like :
DATA | FEATURE1 | FEATURE2 | ...
I | 0.3213 | 1.231 | ...
A | 5.0945 | 0.923 | ...
I | 0.3213 | 0.761 | ...
... | ... | .... | ...
I'm using that code :
import csv
import numpy as np
from sklearn.feature_selection import SelectKBest, mutual_info_classif
def get_ranks (path_to_csv_file, features_columns, label_column):
stats_file = list(csv.reader(open(path_to_csv_file)))
features, label = np.array(stats_file)[feature_columns],np.array(stats_file)[label_column]
mutual_info = mutual_info_classif(features, label)
Using Weka, all I needed to do is to chose InfogainAttrebuteEval
and I got ranked list of the FEATURES
.
For some reason I don't get the same rank results using the above code.
What is the problem?