I try to run the following code on a machine with 16 available CPUs:
def tokenizer(text):
return text.split()
param_grid = [{'vect__stop_words': [None, stop],
'vect__binary': [True, False]}]
bow = CountVectorizer(ngram_range=(1,1), tokenizer=tokenizer)
multinb_bow = Pipeline([('vect', bow), ('clf', MultinomialNB())])
gs_multinb_bow = GridSearchCV(multinb_bow, param_grid, scoring='f1_macro',
cv=3, verbose=1, n_jobs=-1)
gs_multinb_bow.fit(X_train, y_train)
I set n_jobs
to -1
, but scikit-learn
switches to SequentialBackend
, even if I add a context manager with parallel_backend('loky'):
and the script still runs using only 1 concurrent worker.
Fitting 3 folds for each of 4 candidates, totalling 12 fits
[Parallel(n_jobs=-1)]: Using backend SequentialBackend with 1 concurrent workers.
Same result persists if I specify a different value for n_jobs
.
Why is this happening? I recently ran what seems to be an identical code on a similar task, and grid search worked in parallel on multiple CPUs, as specified by n_jobs
using LokyBackend
.