1

I have observed in many articles and books that model selection is done before model tuning.

Model selection is generally done using some form of cross validation like k-fold where multiple models' metrics are calculated and the best one is selected.

And then the model selected is tuned to get the best hyperparameters.

But my question is that a model that was not selected might perform better with the right hyperparameters.

So why aren't all the models we are interested in tuned to get the right hyperparameters and then the best model be selected by cross validation.

merv
  • 67,214
  • 13
  • 180
  • 245
Mohit Shah
  • 843
  • 1
  • 6
  • 20

1 Answers1

1

It depends on the experimental set-up followed in each article/book, but in short the correct way of performing model selection + hyperparameter optimisation in the same experiment is to use Nested Cross-validation:

  • An Outer loop that evaluates the performance of the model (as usual)
  • An Inner loop (that splits again the dataset formed by the N-1 training partitions of the outer loop) that performs hyperparameter optimisation in every fold.

You can have a look at this other question to learn more about this validation scheme.

Note, however, that in some cases it can be acceptable to just do a general comparison with all the models, and then optimise only the top performing ones. But, in a rigorous study, this is far from ideal.

carrdelling
  • 1,675
  • 1
  • 17
  • 17
  • could you tell me a case when is it acceptable to do a general comparison ? – Mohit Shah Mar 11 '18 at 05:30
  • Sure: For example, when you are not that much interested on testing the models, but a different step in your pipeline. Say you develop a new feature selection method. You can check how it works with several families of models (using sensible default configurations) and focus your efforts on tunning the hyperparameters of the feature selection method. Several major families of models (e.g. k-nn, decision trees, naive bayes...) will generally do well without a lot of hyperparameter optimisation, so you can just do a general comparison there. – carrdelling Mar 11 '18 at 10:32