I want to apply sample weights and at the same time use a pipeline from sklearn which should make a feature transformation, e.g. polynomial, and then apply a regressor, e.g. ExtraTrees.
I am using the following packages in the two examples below:
from sklearn.ensemble import ExtraTreesRegressor
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
Everything works well as long as I seperately transform the features and generate and train the model afterwards:
#Feature generation
X = np.random.rand(200,4)
Y = np.random.rand(200)
#Feature transformation
poly = PolynomialFeatures(degree=2)
poly.fit_transform(X)
#Model generation and fit
clf = ExtraTreesRegressor(n_estimators=5, max_depth = 3)
weights = [1]*100 + [2]*100
clf.fit(X,Y, weights)
But doing it in a pipeline, does not work:
#Pipeline generation
pipe = Pipeline([('poly2', PolynomialFeatures(degree=2)), ('ExtraTrees', ExtraTreesRegressor(n_estimators=5, max_depth = 3))])
#Feature generation
X = np.random.rand(200,4)
Y = np.random.rand(200)
#Fitting model
clf = pipe
weights = [1]*100 + [2]*100
clf.fit(X,Y, weights)
I get the following error: TypeError: fit() takes at most 3 arguments (4 given) In this simple example, it is no issue to modify the code, but when I want to run several different tests on my real data in my real code, being able to use pipelines and sample weight