4

I have a binary time series classification problem.

Since it is a time series, I can't just train_test_split my data. So, I used the object tscv = TimeSeriesSplit() from this link, and got something like this:

enter image description here

I can see from GridSearchCV and cross_val_score that i can pass as parameter my split strategy cv = tscv. But my question is, whats the difference between GridSearchCV and coss_val_score? Using one of them is enough to train/test my model? or should i use both? First the GridSearchCV to get the best hyperparamaters and then the cross_val_score?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Murilo
  • 533
  • 3
  • 15
  • 2
    They are essentially doing the same jobs but in different ways. In fact, the `GridSearchCV` itself uses the `cross_val_score` for finding the optimized combination of parameters. GridSearch is known to be a very slow method of tuning your hyperparameters and you are much better off sticking with `RandomSearchCV` or the more advanced Bayesian Hyperparameter Optimization methods – meti Dec 11 '21 at 15:40
  • @meti then the cross-validation used by GridsearchCV is only used to find the best hyperparameters or also to calculate the metrics of the model (such as the roc_auc) with cross-validation? After GridsearchCV() do we need to use cross_validate() ? Or is it already included on it? – skan Jul 31 '23 at 00:54
  • @skan If you utilize GridsearchCV, it will furnish you with the most optimal hyperparameter based on your preferred metrics. However, if you wish to obtain a report on a specific metric such as ROC_CURVE, you will need to provide it separately along with the best combination of hyperparameters. – meti Aug 06 '23 at 11:29

1 Answers1

4

Grid search is a method to evaluate models by using different hyperparameter settings (the values of which you define in advance). Your GridSearch can use cross validation (hence, GridSearchCV exists) in order to deliver a final score for the the different parameter settings of your model. After the training and the evaluation (after the grid search has finished), you can take a look at the parameters with which your model performed best (by taking a look at the attribute best_params_dict).So, Grid search is basically a brute forcing strategy in which you run the model with all possible hyperparameter combinations. With coss_val_score you don't perform the grid search (you don't use the strategy mentioned above with all predefined params), but you get the score after the cross-validation. I hope it is now clear.

teoML
  • 784
  • 4
  • 13
  • But the ```.scores``` of both are sometimes drastically different. A mean of ```coss_val_score`.score``with k=5 maybe around 0.5, whereas as that of ```GridSearchCV``` could stay very high. Any idea why that is the case ? – Salih Jan 11 '23 at 05:13
  • It should be because of the parameters (since in the gridsearch you might use other params compared to the cv only). Give me a concrete example. – teoML Jan 12 '23 at 08:16
  • Then the cross-validation used by GridsearchCV is only used to find the best hyperparameters or also to calculate the metrics of the model (such as the roc_auc) with cross-validation? After GridsearchCV() do we need to use cross_validate() ? Or is it already included on it? – skan Jul 31 '23 at 00:55