I think I understand that until recently people used the attribute coef_ to extract the most informative features from linear models in python's machine learning library sklearn. Now users get pointed to SelectFromModel instead. SelectFromModel allows to reduce the features based on a threshold. So something like the following code reduces the features down to those features which have an importance > 0.5. My question now: Is there any way to determine whether a feature is positivly or negatively discriminating for a class?
I have my data in a pandas dataframe called data, first column a list of filenames of text files, second column the label.
count_vect = CountVectorizer(input="filename", analyzer="word")
X_train_counts = count_vect.fit_transform(data["filenames"])
print(X_train_counts.shape)
tf_transformer = TfidfTransformer(use_idf=True)
traindata = tf_transformer.fit_transform(X_train_counts)
print(traindata.shape) #report size of the training data
clf = LogisticRegression()
model = SelectFromModel(clf, threshold=0.5)
X_transform = model.fit_transform(traindata, data["labels"])
print("reduced features: ", X_transform.shape)
#get the names of all features
words = np.array(count_vect.get_feature_names())
#get the names of the important features using the boolean index from model
print(words[model.get_support()])