2

My data looks like :

DATA | FEATURE1 | FEATURE2 | ... 
I    | 0.3213   | 1.231    | ...
A    | 5.0945   | 0.923    | ...
I    | 0.3213   | 0.761    | ...
...  | ...      | ....     | ...

I'm using that code :

import csv
import numpy as np
from sklearn.feature_selection import SelectKBest, mutual_info_classif

def get_ranks (path_to_csv_file, features_columns, label_column):
    stats_file = list(csv.reader(open(path_to_csv_file)))
    features, label = np.array(stats_file)[feature_columns],np.array(stats_file)[label_column]
    mutual_info = mutual_info_classif(features, label)

Using Weka, all I needed to do is to chose InfogainAttrebuteEval and I got ranked list of the FEATURES. For some reason I don't get the same rank results using the above code.
What is the problem?

0xhido
  • 104
  • 9

0 Answers0