I have created a pipeline which looks like this -
Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('numerical_transform',
RobustScaler(),
['powerPS', 'kilometer',
'age_of_car']),
('categorical_transform',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False),
['abtest', 'vehicleType',
'gearbox', 'model',
'fuelType', 'brand'])])),
('regressor',
RandomForestRegressor(n_jobs=6, random_state=42, verbose=1))])
Then, I cross-validate this pipeline with sklearn.model_selection.cross_validate
like this -
scores = cross_validate(
model, data_dict['x_train'], data_dict['y_train'], cv = cv, scoring = ['neg_root_mean_squared_error', 'r2'],
n_jobs = 6, return_estimator = True, return_train_score = True)
However when I try to access the returned estimators, I only get RandomForestRegressor and not the full pipeline. I want to save these models and use them further down for model_selection.
for fold, model in scores['estimator']:
print(type(model))
# Output
# <class 'sklearn.ensemble._forest.RandomForestRegressor'>
# <class 'sklearn.ensemble._forest.RandomForestRegressor'>
Would this even be possible? If not, can you suggest an alternative? I am new to machine learning and not sure if I am going about this the right way.