0

I have created a pipeline which looks like this -

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numerical_transform',
                                                  RobustScaler(),
                                                  ['powerPS', 'kilometer',
                                                   'age_of_car']),
                                                 ('categorical_transform',
                                                  OneHotEncoder(handle_unknown='ignore',
                                                                sparse_output=False),
                                                  ['abtest', 'vehicleType',
                                                   'gearbox', 'model',
                                                   'fuelType', 'brand'])])),
                ('regressor',
                 RandomForestRegressor(n_jobs=6, random_state=42, verbose=1))])

Then, I cross-validate this pipeline with sklearn.model_selection.cross_validate like this -

    scores = cross_validate(
        model, data_dict['x_train'], data_dict['y_train'], cv = cv, scoring = ['neg_root_mean_squared_error', 'r2'],
        n_jobs = 6, return_estimator = True, return_train_score = True)

However when I try to access the returned estimators, I only get RandomForestRegressor and not the full pipeline. I want to save these models and use them further down for model_selection.

 for fold, model in scores['estimator']:
        print(type(model))

# Output
# <class 'sklearn.ensemble._forest.RandomForestRegressor'>
# <class 'sklearn.ensemble._forest.RandomForestRegressor'>

Would this even be possible? If not, can you suggest an alternative? I am new to machine learning and not sure if I am going about this the right way.

0 Answers0