Sklearn, gridsearch: how to print out progress during the execution?

Question

I am using GridSearch from sklearn to optimize parameters of the classifier. There is a lot of data, so the whole process of optimization takes a while: more than a day. I would like to watch the performance of the already-tried combinations of parameters during the execution. Is it possible?

How about trying it on less data to get a feel for the right parameter range in shorter evaluation cycles. And then get a feel for whether your choice of parameters on a reduced set scales properly. That depends on your estimator, which you are not naming us. — eickenberg, Jun 09 '14 at 17:47
That sounds sensible, thanks. I am using wrapper around Vowpal Wabbit. — doubts, Jun 10 '14 at 08:51
Andreas, verbose : integer Controls the verbosity: the higher, the more messages. It does not say it clearly. — doubts, Jun 11 '14 at 10:58
The other part of the story, which I do not know if it was asked, is that you can get a lot of warning statements as well if your process takes a day. The "verbose" setting will not filter these and this makes monitoring the progress still difficult. Would there be an approach which also suppresses these warning messages? — demongolem, Jun 05 '20 at 12:51

score 167 · Accepted Answer · answered Jun 10 '14 at 15:15

167

Set the verbose parameter in GridSearchCV to a positive number (the greater the number the more detail you will get). For instance:

GridSearchCV(clf, param_grid, cv=cv, scoring='accuracy', verbose=10)

answered Jun 10 '14 at 15:15

DavidS

2,344
1
17
18

58

Just to add: if you are using IPython Notebook, the output is in the IPython terminal window, not in the interactive session. – arun Apr 03 '16 at 01:40
10

What is the actual highest meaningful value of this parameter? Docs mention only "the higher, the more messages.". So, how high can we go and still get more messages? – Daddy32 Jun 08 '20 at 11:24
3

As Arturo said below, "verbose=2 is a great choice for most of the practices. It will return one line per parameter set (including CV)" – Marc Apr 25 '21 at 18:42
3

On my system, I had to set `n_jobs=1` (default), or no message was shown on JupyterLab. – Marc Apr 25 '21 at 18:48
@Daddy32 did you find out which one was the highest? – Caterina Mar 31 '22 at 13:02
No, but per [current documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html), it seems any value >3 is currently equivalent. Haven't time to confirm this in the sources. – Daddy32 Apr 01 '22 at 16:18
2

Highest param is verbose=3, which is great, bc it gives the params tested in that batch and the most importantly, the score for that specific set of params, as it progresses. Maybe 10 was a setting way back in 2014, lol, but not going to do anything more than 3 these days. – Bourne Jul 21 '22 at 18:15
Why has this answer still not been edited replacing `verbose=10` with `verbose=3`? I get "There are too many pending edits on Stack Overflow". – Nermin Apr 24 '23 at 09:16

score 38 · Answer 2 · answered Apr 07 '20 at 15:08

I would just like to complement DavidS's answer

To give you an idea, for a very simple case, this is how it looks with verbose=1:

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished

And this is how it looks with verbose=10:

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    7.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.630, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   13.5s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   20.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.7s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   26.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.632, total=   7.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   34.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.622, total=   6.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:   41.6s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.627, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   48.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.628, total=   7.2s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:   55.9s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.640, total=   6.6s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:  1.0min remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.629, total=   6.6s
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished

In my case, verbose=1 does the trick.

In my opinion, `verbose=2` is a great choice for most of the practices. It will return one line per parameter set (including CV). — mhellmeier, Feb 15 '21 at 18:02

score 15 · Answer 3 · edited Dec 05 '19 at 07:43

Check out the GridSearchCVProgressBar

Just found it right now and I'm using it. Very into it:

In [1]: GridSearchCVProgressBar
Out[1]: pactools.grid_search.GridSearchCVProgressBar

In [2]:

In [2]: ??GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Source:
class GridSearchCVProgressBar(model_selection.GridSearchCV):
    """Monkey patch Parallel to have a progress bar during grid search"""

    def _get_param_iterator(self):
        """Return ParameterGrid instance for the given param_grid"""

        iterator = super(GridSearchCVProgressBar, self)._get_param_iterator()
        iterator = list(iterator)
        n_candidates = len(iterator)

        cv = model_selection._split.check_cv(self.cv, None)
        n_splits = getattr(cv, 'n_splits', 3)
        max_value = n_candidates * n_splits

        class ParallelProgressBar(Parallel):
            def __call__(self, iterable):
                bar = ProgressBar(max_value=max_value, title='GridSearchCV')
                iterable = bar(iterable)
                return super(ParallelProgressBar, self).__call__(iterable)

        # Monkey patch
        model_selection._search.Parallel = ParallelProgressBar

        return iterator
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta

In [3]: ?GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Docstring:      Monkey patch Parallel to have a progress bar during grid search
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta

This will only print to std.err and not show up in Spyder or the iPython Notebook — skjerns, Mar 19 '19 at 13:21
For the sake of completiness, it doesn't work either on Visual Studio Code :( — glezo, Oct 24 '21 at 10:19
replacing the `bar` with another progress bar like [tqdm](https://github.com/tqdm/tqdm) might fix the display. — TomDLT, May 28 '22 at 02:52

score 0 · Answer 4 · answered Jan 23 '23 at 20:22

0

Quick Workaround : If you are using nb in Chrome, just search for any word in grid search output. Chrome will automatically update the progress as GridSearch returns more output back to nb.

answered Jan 23 '23 at 20:22

MachineLearning Enthusiast

11
1

Sklearn, gridsearch: how to print out progress during the execution?

4 Answers4

Linked