I would like to use FunctionTransformer
and at the same time provide a simple API and hide the additional details. Specifically, I'd like to be able to provide a Custom_Trans
class as shown below. So, instead of trans1
, which works fine, the user should be able to use trans2
which fails at the moment:
from sklearn import preprocessing
from sklearn.pipeline import Pipeline
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
import numpy as np
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)
def func(X, a, b):
return X[:,a:b]
class Custom_Trans(preprocessing.FunctionTransformer):
def __init__(self, ind0, ind1):
super().__init__(
func=func,
kw_args={
"a": ind0,
"b": ind1
}
)
trans1 = preprocessing.FunctionTransformer(
func=func,
kw_args={
"a": 0,
"b": 50
}
)
trans2 = Custom_Trans(0,50)
pipe1 = Pipeline(
steps=[
('custom', trans1),
('linear', LinearRegression())
]
)
pipe2 = Pipeline(
steps=[
('custom', trans2),
('linear', LinearRegression())
]
)
print(model_selection.cross_val_score(
pipe1, X, y, cv=3,)
)
print(model_selection.cross_val_score(
pipe2, X, y, cv=3,)
)
This is what I get:
[0.99999331 0.99999671 0.99999772]
...sklearn/base.py:209: FutureWarning: From version 0.24, get_params will raise an
AttributeError if a parameter cannot be retrieved as an instance attribute.
Previously it would return None.
warnings.warn('From version 0.24, get_params will raise an '
...
[0.99999331 0.99999671 0.99999772]
I kinda know that it's related to estimator cloning, but I don't know how to fix it. E.g this post says that
there should be no logic, not even input validation, in an estimators init. The logic should be put where the parameters are used, which is typically in fit
but in this case, I need to pass the parameters to the superclass. There is no way to put the logic in the fit()
.
What can I do?