Sklearn Univariate Selection: Features are Constant

Question

I am getting the following warning message when trying to use Feature Selection and f_classif (ANOVA test) on some data in sklearn:

C:\Users\Alexander\Anaconda3\lib\site-packages\sklearn\feature_selection\univariate_selection.py:113: UserWarning: Features ... are constant. UserWarning)

The features that the warning message indicated were constant apparently had p-values of 0. I was unable to find any information about what was causing this warning. The github file for this particular function is here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/univariate_selection.py

Any help would be appreciated, thanks.

this means that the `Features [indices of features]` are constant. Use X[indice] to see what's going on. I believe that these features are 0 for all samples. — seralouk, Jul 06 '18 at 11:47
@Long I guess seralouk is suggesting to inspect the values of that feature to see if there are different values or whole column is made up by the same values. X denotes the dataset. So X[indice] is one feature from the dataset. If you find out that all the values are the same, the feature itself will have 0 effect in your model. — Kattia, Apr 22 '20 at 16:32

score 8 · Answer 1 · answered Apr 28 '20 at 14:02

You get the feature by using the index as an index on the array of columns from your X: X_train.columns[yourindex]

Then you can either drop this feature manually, or you can use VarianceFilter to remove all zero-variance features:

    from sklearn.feature_selection import VarianceThreshold
    constant_filter = VarianceThreshold(threshold=0)
    constant_filter.fit(X_train)
    constant_columns = [column for column in X_train.columns
                    if column not in
    X_train.columns[constant_filter.get_support()]]
    X_test = constant_filter.transform(X_train)
    X_test = constant_filter.transform(X_test)
    for column in constant_columns:
        print("Removed ", column)

You would have to determine the zero-variance features on the training dataframe, because your overall df could contain the feature more than once. Then remove the feature from both dfs.

Sklearn Univariate Selection: Features are Constant

1 Answers1