19

I want to apply sample weights and at the same time use a pipeline from sklearn which should make a feature transformation, e.g. polynomial, and then apply a regressor, e.g. ExtraTrees.

I am using the following packages in the two examples below:

from sklearn.ensemble import ExtraTreesRegressor
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

Everything works well as long as I seperately transform the features and generate and train the model afterwards:

#Feature generation
X = np.random.rand(200,4)
Y = np.random.rand(200)

#Feature transformation
poly = PolynomialFeatures(degree=2)
poly.fit_transform(X)

#Model generation and fit
clf = ExtraTreesRegressor(n_estimators=5, max_depth = 3)
weights = [1]*100 + [2]*100
clf.fit(X,Y, weights)

But doing it in a pipeline, does not work:

#Pipeline generation
pipe = Pipeline([('poly2', PolynomialFeatures(degree=2)), ('ExtraTrees', ExtraTreesRegressor(n_estimators=5, max_depth = 3))])

#Feature generation
X = np.random.rand(200,4)
Y = np.random.rand(200)

#Fitting model
clf = pipe
weights = [1]*100 + [2]*100
clf.fit(X,Y, weights)

I get the following error: TypeError: fit() takes at most 3 arguments (4 given) In this simple example, it is no issue to modify the code, but when I want to run several different tests on my real data in my real code, being able to use pipelines and sample weight

stefanE
  • 193
  • 2
  • 7

1 Answers1

29

There is mention of **fit_params in the fit method of Pipeline documentation. You must specify which step of the pipeline you want to apply the parameter to. You can achieve this by following the naming rules in the docs:

For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

So all that being said, try changing the last line to:

clf.fit(X,Y, **{'ExtraTrees__sample_weight': weights})

This is a good example of how to work with parameters in pipelines.

joeytwiddle
  • 29,306
  • 13
  • 121
  • 110
Kevin
  • 7,960
  • 5
  • 36
  • 57
  • 2
    Thanks, Kevin! This solved the problem and the example is really nice to see how parameters work in pipelines! – stefanE Mar 28 '16 at 18:18