1

I would like to use FunctionTransformer and at the same time provide a simple API and hide the additional details. Specifically, I'd like to be able to provide a Custom_Trans class as shown below. So, instead of trans1, which works fine, the user should be able to use trans2 which fails at the moment:

from sklearn import preprocessing 
from sklearn.pipeline import Pipeline
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
import numpy as np

X, y = make_regression(n_samples=100, n_features=1, noise=0.1)

def func(X, a, b):
    return X[:,a:b]

class Custom_Trans(preprocessing.FunctionTransformer):
    def __init__(self, ind0, ind1):
        super().__init__(
            func=func,
            kw_args={
                "a": ind0,
                "b": ind1
            }
        )

trans1 = preprocessing.FunctionTransformer( 
    func=func,
    kw_args={
        "a": 0,
        "b": 50
    }
)

trans2 = Custom_Trans(0,50)

pipe1 = Pipeline(
    steps=[
           ('custom', trans1),
           ('linear', LinearRegression())
         ]
)

pipe2 = Pipeline(
    steps=[
           ('custom', trans2),
           ('linear', LinearRegression())
          ]
)

print(model_selection.cross_val_score(
    pipe1, X, y, cv=3,)
)

print(model_selection.cross_val_score(
    pipe2, X, y, cv=3,)
)

This is what I get:

[0.99999331 0.99999671 0.99999772]
...sklearn/base.py:209: FutureWarning: From version 0.24, get_params will raise an
AttributeError if a parameter cannot be retrieved as an instance attribute. 
Previously it would return None.
warnings.warn('From version 0.24, get_params will raise an '
...
[0.99999331 0.99999671 0.99999772]

I kinda know that it's related to estimator cloning, but I don't know how to fix it. E.g this post says that

there should be no logic, not even input validation, in an estimators init. The logic should be put where the parameters are used, which is typically in fit

but in this case, I need to pass the parameters to the superclass. There is no way to put the logic in the fit(). What can I do?

towi_parallelism
  • 1,421
  • 1
  • 16
  • 38
  • Just saw this note in the `BaseEstimator` _All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs)._ – towi_parallelism May 27 '20 at 00:47

1 Answers1

0

You can get 'get_params' by inheriting from BaseEstimator.

class FunctionTransformer(BaseEstimator, TransformerMixin)

How to pass parameters to the customize modeltransformer class

inherit from function_transformer

custom transformers

You have this in base:

def get_params(self, deep=True):
        """
        Get parameters for this estimator.
        Parameters
        ----------
        deep : bool, default=True
            If True, will return the parameters for this estimator and
            contained subobjects that are estimators.
        Returns

Change your code:

trans1 = dict(
    functiontransformer__kw_args=[
        {'ind0': None},
        {'ind0': [1]}
    ]
)

class Custom_Trans(preprocessing.FunctionTransformer): 
    def __init__(self, ind0, ind1, deep=True): 
        super().__init__( func=func, kw_args={ "a": ind0, "b": ind1 } ) 
        self.ind0 = ind0
        self.ind1 = ind1
        self.deep = True 
Mahsa Hassankashi
  • 2,086
  • 1
  • 15
  • 25
  • thanks, but I'm already inheriting from `FunctionTransformer`, so `trans2` has `get_prams()` the problem is that it returns `{'ind0': None, 'ind1': None}` – towi_parallelism May 25 '20 at 20:02
  • 2
    I think get_params() function looks at the init arguments to figure out what the class parameters are and then suppose that they have same names as the internal variable names. – Mahsa Hassankashi May 25 '20 at 20:03
  • It is your problem at line 209: https://github.com/scikit-learn/scikit-learn/blob/94d8911310b7ec9cb6be2752d42b0cbd4c003c93/sklearn/base.py#L187 – Mahsa Hassankashi May 25 '20 at 20:05
  • true. I remember reading this before, so it's basically a bug in SciKit – towi_parallelism May 25 '20 at 20:05
  • Lets simply change your __init__ – Mahsa Hassankashi May 25 '20 at 20:08
  • I did rename `a` and `b` to `ind0` and `ind1`, but still getting `None` and the same warning/error message – towi_parallelism May 25 '20 at 20:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/214615/discussion-between-mahsa-hassankashi-and-towi-parallelism). – Mahsa Hassankashi May 25 '20 at 20:15
  • thanks. I don't need the Base. Can you please update your answer? It should be like: ```class Custom_Trans(preprocessing.FunctionTransformer): def __init__(self, ind0, ind1): super().__init__( func=func, kw_args={ "a": ind0, "b": ind1 } ) self.ind0 = ind0 self.ind1 = ind1 ``` I shall accept your answer, but I'll just wait a few hours to see if someone from the SciKit can confirm that they'll change this method in the future. Thanks. – towi_parallelism May 25 '20 at 20:30