0

I am doing an SVR and SVC optimizations through GridSearchCV with parallization n_jobs=-1 which is 8 in my case, my question is why do the first fits run very fast compared with the last fits? As in the photo 10212 fits took 23.7 sec but the full number of fits 106764 needed 20.7 min, which should be 4.2 minutes only if a linear extrapolation is assumed.

here is a sample of the code:

opt = GridSearchCV(SVR(tol=tol),param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose)
opt.fit(allr_sets_nor[:,:2],allr_sets_nor[:,2])

and this is the screen log:

enter image description here

Ahmad Sultan
  • 1,621
  • 2
  • 13
  • 19

1 Answers1

1

Support-Vector-Machine learning is highly dependent on the parameters given.

Parameters like C have consequences on the number of support-vectors and therefore instances with many support-vectors (indirectly controlled by C) are trained much more slowly.

This is a basic caveat of GridSearches.

(Another slightly more complete take on this here by user lejlot)

The learning-algorithms are also based on heuristics, which add some additional hardly-predictable factor to this.

Community
  • 1
  • 1
sascha
  • 32,238
  • 6
  • 68
  • 110
  • Okey, Thank you. I thought it might be a problem with the parallelization. – Ahmad Sultan Feb 20 '17 at 16:25
  • @AhmadSultan This kind of parallelization is much simpler than other more granular approaches. Of course something can go wrong, but often achieving linear speedup is not a problem. – sascha Feb 20 '17 at 16:30