5

It is common to scale the training and testing data separately before training and predicting progress of a classification task.

I want to embed the aforementioned process in RFECV which runs CV tests thus I tried the following:

Do X_scaled = preprocessing.scale(X) in the first place, where X is the whole data set. By doing so, training and testing data are not scaled separately, which is not considered.

The other way I tried is to pass:

scaling_svm = Pipeline([('scaler', preprocessing.StandardScaler()),
                        ('svm',LinearSVC(penalty=penalty, dual=False, class_weight='auto'))])

as parameter to the argument in RFECV :

rfecv = RFECV(estimator=scaling_svm, step=1, cv=StratifiedKFold(y, 7),
                  scoring=score, verbose=0)

However, I got an error since RFECV needs the estimator to have attribute .coef_. What should I suppose to do? Any help would be appreciated.

Francis
  • 6,416
  • 5
  • 24
  • 32

1 Answers1

0

A bit late to the party, admittedly, but if anyone is interested you can create a customised pipeline as follows:

from sklearn.pipeline import Pipeline
class RfePipeline(Pipeline):
    @property
    def coef_(self):
        return self._final_estimator.coef_

And then replace Pipeline with RfePipeline in your code.

See similar question here.

David M.
  • 4,518
  • 2
  • 20
  • 25