8

I am getting the following warning message when trying to use Feature Selection and f_classif (ANOVA test) on some data in sklearn:

C:\Users\Alexander\Anaconda3\lib\site-packages\sklearn\feature_selection\univariate_selection.py:113: UserWarning: Features ... are constant. UserWarning)

The features that the warning message indicated were constant apparently had p-values of 0. I was unable to find any information about what was causing this warning. The github file for this particular function is here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/univariate_selection.py

Any help would be appreciated, thanks.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
Alex
  • 3,946
  • 11
  • 38
  • 66
  • 1
    Did you figure this out? I'm also getting this – O.rka Nov 24 '16 at 21:25
  • 2
    this means that the `Features [indices of features]` are constant. Use X[indice] to see what's going on. I believe that these features are 0 for all samples. – seralouk Jul 06 '18 at 11:47
  • can you be more specific about what is "X[indice]" here? – Long Feb 19 '20 at 11:30
  • @Long I guess seralouk is suggesting to inspect the values of that feature to see if there are different values or whole column is made up by the same values. X denotes the dataset. So X[indice] is one feature from the dataset. If you find out that all the values are the same, the feature itself will have 0 effect in your model. – Kattia Apr 22 '20 at 16:32

1 Answers1

8

You get the feature by using the index as an index on the array of columns from your X: X_train.columns[yourindex]

Then you can either drop this feature manually, or you can use VarianceFilter to remove all zero-variance features:

    from sklearn.feature_selection import VarianceThreshold
    constant_filter = VarianceThreshold(threshold=0)
    constant_filter.fit(X_train)
    constant_columns = [column for column in X_train.columns
                    if column not in
    X_train.columns[constant_filter.get_support()]]
    X_test = constant_filter.transform(X_train)
    X_test = constant_filter.transform(X_test)
    for column in constant_columns:
        print("Removed ", column)

You would have to determine the zero-variance features on the training dataframe, because your overall df could contain the feature more than once. Then remove the feature from both dfs.