2

I have a use case for FunctionTransform where training examples need to be sorted along with their true labels based on some criterion.

def sort_examples(X, y=None):
    Xt, indices = zip(*map(itemgetter(1, 2),
                      sorted([(x.nnz, x, i) for i, x in 
                              enumerate(X)], key = itemgetter(0))))
    if y is not None:
       yt = [yy[idx] for idx in indices]
    return(Xt, yt)

classifier = Pipeline(steps=[
 ('sorter', FunctionTransformer(func=sort_examples, 
                                validate=False,
                                accept_sparse=True, 
                                pass_y=True)), 
 ('classifier', DummyClassifier())])

The problem is when I embed FunctionTransform instance in Pipeline which wraps my implementation function by passing pass_y = True (since y needs to be transformed too), the Pipeline will intentionally drop y by calling <FunctionTransform instance>.fit(x, y).transform(x) without returning transformed y.

As a consequence of that, training examples are transformed and sorted but not associating true labels.

My current work is that patch FunctionTransform with fit_transform and by pass calling sklearn.FunctionTransform.transform method explicitly but implicitly within fit_transform body to enforce y is transformed as well.

I’m not sure if this use case is legitimate for what FunctionTransform is designed for. I’ll be deeply grateful If there are any scikit-learn experts could provide suggestions or better solution how to get training examples and corresponding labels transformed in an automatic pipeline

Related Quesiton

Community
  • 1
  • 1
ReneWang
  • 516
  • 5
  • 7

0 Answers0