Multiouput stacked regression with a single final estimator

Question

Is there a way to train a stacked regressor with scikit-learn such that a single final estimator is used to return multiple outputs?

I have been using sklearn.ensemble.StackingRegressor but, as indicated in the documentation of the .fit() method, this is only suitable if the final estimator returns an output of shape (n_samples,) while I would need (n_samples, n_features).

As a workaround, Sklearn implements sklearn.ensemble.MultiOutputRegressor (as proposed here: Multioutput Stacking Regressor) which extends monovariate models into multivariate ones by training a separate model for each output feature. I got this solution to run, however, I am not satisfied with this as it takes too long to train on high-dimensional data and needlessly multiplies the number of parameters.

Instead, I would like to use a single final estimator (e.g. a random forest) that can take a multivariate input and return a multivariate output. This would make the prediction pipeline straightforward and, I believe, faster to train.

The following illustration represents what I would like to achieve:

+-----------------------+----+--------------+-------------------------------------------+----+-------------+----+------------------+
| Set of observations 1 | -> | Base Model 1 |                                           |    |             |    |                  |
|    (nsmpls, nfeats)   |    |              |                                           |    |             |    |                  |
+-----------------------+----+--------------+-------------------------------------------+----+-------------+----+------------------+
|                                           | Concatenated predictions from base models | -> | Final model | -> | Final prediction |
|                                           |             (nsmpls, 2 * nfeats)          |    |             |    | (nsmpls, nfeats) |
+-----------------------+----+--------------+-------------------------------------------+----+-------------+----+------------------+
| Set of observations 2 | -> | Base Model 2 |                                           |    |             |    |                  |
|    (nsmpls, nfeats)   |    |              |                                           |    |             |    |                  |
+-----------------------+----+--------------+-------------------------------------------+----+-------------+----+------------------+

In this specific example the number of input and output features are identical for the base models, but the rational would be the same in a different setup. Also, the set of input features are different for each base model, but this is a separate issues that I solved using: How to use different feature matrices for sklearn.ensemble.StackingClassifier (with class inheritance)?

I could also directly train the final model on the concatenated inputs (as opposed to the concatenated predictions) without stacking, but I would also like to have these 2 base models trained as they are of interest on their own. The pipeline I described here would kill two birds with one stone.

The documentation for `StackingRegressor.predict` indicates it can produce output of shape `(n_samples, n_output)`; what happens if you just try a random forest as final estimator? — Ben Reiniger, May 23 '23 at 02:16
@BenReiniger There is actually a strange inconsistency in the docs of `StackingRegressor` as `.predict` indicates that multioutput is possible while `.fit` indicates it is not. Trying to fit a stack with a random forest as final estimator returns the following error: `ValueError: y should be a 1d array, got an array of shape (XXX, YYY) instead.` Where `XXX` is the number of observations and `YYY` is the number of predicted features. — majpark, May 23 '23 at 17:50
To follow-up on this, I just found this GitHub issue which is identical to my problem. They do highlight that the current documentation is incorrect. — majpark, May 25 '23 at 22:51

Multiouput stacked regression with a single final estimator

0 Answers0