how to change feature weight when training a model with sklearn?

Question

I want to classifier text by using sklearn. first I used bag of words to training the data, the feature of bag of words are really large, more than 10000 features, so I reduced this feature by using SVD to 100.

But here I want to add some other features like # of words, # of positive words, # of pronouns etc. the additional features are only 10 less features, which compare to the 100 of bag of words feature are really small

From this situation I raise 2 questions:

Is there some function in sklearn that can change the additional features' weight to make them more important?
How do I check the additional feature is important to classifier?

Sounds like you can simply append your additional features to your SVD features along the 1st axis, then train a classifier on the resulting matrix. There are a number of classifiers which allow you to see the feature importances, e.g. GradientBoostingClassifier. I don't think you can change the features' importances after training the classifier; their importances will reflect their usefulness in predicting your y. — Ryan, Nov 28 '15 at 16:38
Thx, I mean, if there are some functions for test similarity between features and class? like before training the classifier, I got similarity rank, which give me idea that which features is important to classification? — HAO CHEN, Nov 28 '15 at 17:41

fernandosjp · Answer 1 · 2015-12-02T14:08:19.667

Although very much interest, I don't know the answer for the main question. In the meanwhile I can help with the second one.

After fitting a model you can access the feature importance through the attribute model.feature_importances_

I use the following function to normalize the importance and show it in a prettier way.

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns # (optional)

def showFeatureImportance(model):
    #FEATURE IMPORTANCE
    # Get Feature Importance from the classifier
    feature_importance = model.feature_importances_

    # Normalize The Features
    feature_importance = 100.0 * (feature_importance / Feature_importance.max())
    sorted_idx = np.argsort(feature_importance)
    pos = np.arange(sorted_idx.shape[0]) + .5

    #plot relative feature importance
    plt.figure(figsize=(12, 12))
    plt.barh(pos, feature_importance[sorted_idx], align='center', color='#7A68A6')
    plt.yticks(pos, np.asanyarray(X_cols)[sorted_idx])
    plt.xlabel('Relative Importance')
    plt.title('Feature Importance')
    plt.show()

how to change feature weight when training a model with sklearn?

1 Answers1