Scaling data in RFECV with scikit-learn

Question

It is common to scale the training and testing data separately before training and predicting progress of a classification task.

I want to embed the aforementioned process in RFECV which runs CV tests thus I tried the following:

Do X_scaled = preprocessing.scale(X) in the first place, where X is the whole data set. By doing so, training and testing data are not scaled separately, which is not considered.

The other way I tried is to pass:

scaling_svm = Pipeline([('scaler', preprocessing.StandardScaler()),
                        ('svm',LinearSVC(penalty=penalty, dual=False, class_weight='auto'))])

as parameter to the argument in RFECV :

rfecv = RFECV(estimator=scaling_svm, step=1, cv=StratifiedKFold(y, 7),
                  scoring=score, verbose=0)

However, I got an error since RFECV needs the estimator to have attribute .coef_. What should I suppose to do? Any help would be appreciated.

score 0 · Answer 1 · answered Feb 13 '22 at 12:29

A bit late to the party, admittedly, but if anyone is interested you can create a customised pipeline as follows:

from sklearn.pipeline import Pipeline
class RfePipeline(Pipeline):
    @property
    def coef_(self):
        return self._final_estimator.coef_

And then replace Pipeline with RfePipeline in your code.

See similar question here.

Scaling data in RFECV with scikit-learn

1 Answers1