Hyperparameters tuning and cross-validation of sklearn models

Question

I have confusion related to the following things:

Data splitting into training, validation, and testing
How and at which step should the hyperparameter tuning be performed, and which data should be used for this purpose?
Can a stratified k-fold cross-validation be performed on the best_estimator_ obtained via randomizedsearchcv?
Finally, which model should be used for deployment? Do I need to retrain the best_estimator_ on the entire training dataset to get the final model?

I have tried an approach for hyperparameter tuning and cross-validation of the Sklearn model, for which I need confirmation from machine-learning experts if the approach is correct or not.

Let me explain briefly what I have done:

I have a training dataset and a separate testing dataset. I used the training dataset and performed hyperparameter tuning via randomizedsearchcv with cv = 5. Once I got the best_estimator_, I then used a stratified 5-fold cross-validation strategy where the training dataset was split into train and validation sets. The best_estimator_ was retrained for the train set and validated on the validation set in each iteration in order to check the generalization of the best_estimator_. And finally, the performance of the best_estimator_ is evaluated on the separate unseen testing dataset.

Your correction or suggestion would be highly appreciated. Thanks in anticipation!

Hyperparameters tuning and cross-validation of sklearn models

0 Answers0