sklearn Stacking Estimator passthrough skips preprocessing and passes original data

Question

This issue has been discussed here but there has been no comments: https://github.com/scikit-learn/scikit-learn/issues/16473

I have some numerical features and categorical features in X. The categorical features were one hot encoded. So my pipeline is something similar to the sklearn docs example:

cat_proc_lin = make_pipeline(
SimpleImputer(missing_values=None,
              strategy='constant',
              fill_value='missing'),
OneHotEncoder(categories=categories)
)

num_proc_lin = make_pipeline(
SimpleImputer(strategy='mean'),
StandardScaler()
)

processor_lin = make_column_transformer(
(cat_proc_lin, cat_cols),
(num_proc_lin, num_cols),
remainder='passthrough')

lasso_pipeline = make_pipeline(processor_lin,
                           LassoCV())

rf_pipeline = make_pipeline(processor_nlin,
                        RandomForestRegressor(random_state=42))

gradient_pipeline = make_pipeline(
    processor_nlin,
    HistGradientBoostingRegressor(random_state=0))

estimators = [('Random Forest', rf_pipeline),
          ('Lasso', lasso_pipeline),
          ('Gradient Boosting', gradient_pipeline)]

stacking_regressor = StackingRegressor(estimators=estimators,
                                   final_estimator=RidgeCV())

But if I change passthrough=True, it will raise a TypeError because the passthrough gives the original X and skips the preprocessing part of the pipeline:

/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
ValueError: could not convert string to float: 'RL'

Is there anyway to make the passthrough include the first preprocessing part of the pipeline?

I also cannot add the preprocessing pipeline infront of the final estimator because it will concatenate the original X dataframe with the final layer predictions which is a numpy array as mentioned in the github discussion link at the top of this post. My exact preprocessing pipeline has several custom transformers that operates on pandas dataframe.

Thank you for any help!

I just hit the exact same issue and, judging from the error, while working on the same dataset. :) Any solutions would be appreciated. — Florin Andrei, Aug 11 '23 at 20:15

sklearn Stacking Estimator passthrough skips preprocessing and passes original data

0 Answers0