-1

I asked in Cross Validated before but it seems it should be proper to ask here.

My data df_X has 11 features, and y is the multi-class label (3,4,5,6,7,8 in samples). I used multi-class SVM to select the importance of features. estimator_.coef_ should return the score of each feature (a list of 11 scores). But why here it returns a list of scores? The same case occurred for multi-class LogisticRegression().

By the way, what's the difference between SelectKBest and SelectFromModel for feature selection in sklearn.

enter image description here

Mario
  • 1,631
  • 2
  • 21
  • 51
user6703592
  • 1,004
  • 1
  • 12
  • 27

1 Answers1

0

I tried to reproduce your case using iris dataset with 4 features; nevertheless, you can try to extend this experiment with diabetes dataset, which has 10 features close to your dataset, as shown below:

#Read and load sample dataset
from sklearn.datasets import load_iris, load_diabetes

X, y = load_iris(return_X_y=True)
iris = load_iris()
X, y = iris.data, iris.target

#Data Set Characteristics
#print(iris.DESCR)

#Fit SVC to data and extract results of feature selection
from sklearn.feature_selection import SelectFromModel
from sklearn.svm import SVC
from time import time

tic = time()
selector = SelectFromModel(estimator=SVC(kernel = 'linear')).fit(X, y)
toc = time()

#Plot the results
import matplotlib.pyplot as plt
import numpy as np
importance = np.abs(selector.estimator_.coef_)
feature_names = np.array(iris.feature_names)
print(f"Features selected by SelectFromModel: {feature_names[selector.get_support()]}") 
#-->Features selected by SelectFromModel: ['petal length (cm)' 'petal width (cm)']
print(f"Done in {toc - tic:.3f}s") #-->Done in 0.002s
#print(X.shape) #-->(150, 4)
#print(selector.transform(X).shape) #-->(150, 2)
plt.bar(height=importance[1], x=feature_names)
plt.xticks(rotation=45, ha='right')
plt.title("Feature importances via coefficients")
plt.show()

So I inspired this post to reflect the results in plot: img

print(selector.threshold_)
print(selector.estimator_.coef_)
print(selector.estimator_.coef_.shape) #-->(3, 4)

#2.1645593987132914
#[[-0.04625854  0.5211828  -1.00304462 -0.46412978]
# [-0.00722313  0.17894121 -0.53836459 -0.29239263]
# [ 0.59549776  0.9739003  -2.03099958 -2.00630267]]

Why sklearn SelectFromModel estimator_.coef_ return a 2d-array?

If you check the input and output of the selector model, both are 2-dimension arrays. It makes sense for the output of selector.estimator_.coef_ since SelectFromModel() as a meta-transformer selects features based on importance weights estimation pairwisely in coef_ matrix.

print(X.ndim)                         #-->2d-array
print(selector.estimator_.coef_.ndim) #-->2d-array

Also based on this post, you can check if there is a zero vector in matrice or simply use get_support() to return a boolean array mapping the selection of each feature:

X_new = selector.transform(X) 
print(X_new.shape)            #-->(150, 2)
print(selector.get_support()) #-->[False False  True  True]

By the way, what's the difference between SelectKBest and SelectFromModel for feature selection in sklearn.

Based on the documentation of SelectKBest() it selects features according to the k highest scores. It "removes all but the k highest scoring features" Reference. It is used commonly with chi2() to "Compute chi-squared stats between eachfor non-negative feature and class." Here is good start point for comparison.

Edit: I also found the workaround here about regularisation to remove non-important features from the dataset and played with coef. and updated the post:

selector.transform(X)
#print(selector.transform(X))

features = array(iris.feature_names)
print("All features:", features)
#All features: ['sepal length (cm)' 'sepal width (cm)' 'petal length (cm)' 'petal width (cm)']

print("Selected features:", features[status])
#Selected features: ['petal length (cm)' 'petal width (cm)']

# note here the absolute transformation before the mean
print("absolute transformation before the mean:", abs(selector.estimator_.coef_).mean()*1.25) 
#absolute transformation before the mean: 1.3025301767884683

print('features with coefficients shrank to zero: {}'.format(np.sum(selector.estimator_.coef_ == 0)))
#features with coefficients shrank to zero: 0
Mario
  • 1,631
  • 2
  • 21
  • 51
  • sorry still don't understand the meaning of row? You choose the second row as the importance of each feature? And pairwise should correspond matrix with 4*4, why the coef is 3*4? – user6703592 Jan 09 '23 at 13:47
  • It seems `.coef_` returns coefficients for each features and they are just about being negative or positive in sense of feature importance. [Ref](https://stackoverflow.com/questions/37961163/how-to-use-selectfrommodel-in-sklearn-to-find-the-positively-informative-feature/37974472#37974472) Note that coefficients estimation/scoring by selector is depended on *Threshold* based on this [post](https://stackoverflow.com/questions/64581307/how-to-properly-do-feature-selection-with-selectfrommodel-from-scikit-learn) its [calibration](https://stackoverflow.com/a/49345882/10452700) process. – Mario Jan 09 '23 at 15:57
  • back to your question in comment, perhaps `.coef_` is not exactly based on covariance coefficients calculation pairwisely among features. I couldn't find any other valid documentation about it to explain row part `#-->(3?, 4)` within `print(selector.estimator_.coef_.shape)` bcz column is as many as features which is here 4 and clear but I also couldn't figure it out why row is 3. Also in your example case in the screenshot, you have `(n, 11)`. – Mario Jan 09 '23 at 16:10