7

I try to run the following code on a machine with 16 available CPUs:

def tokenizer(text):
    return text.split()

param_grid = [{'vect__stop_words': [None, stop],
               'vect__binary': [True, False]}]

bow = CountVectorizer(ngram_range=(1,1), tokenizer=tokenizer)  
multinb_bow = Pipeline([('vect', bow), ('clf', MultinomialNB())])

gs_multinb_bow = GridSearchCV(multinb_bow, param_grid, scoring='f1_macro', 
                              cv=3, verbose=1, n_jobs=-1)

gs_multinb_bow.fit(X_train, y_train)

I set n_jobs to -1, but scikit-learn switches to SequentialBackend, even if I add a context manager with parallel_backend('loky'): and the script still runs using only 1 concurrent worker.

Fitting 3 folds for each of 4 candidates, totalling 12 fits
[Parallel(n_jobs=-1)]: Using backend SequentialBackend with 1 concurrent workers.

Same result persists if I specify a different value for n_jobs.

Why is this happening? I recently ran what seems to be an identical code on a similar task, and grid search worked in parallel on multiple CPUs, as specified by n_jobs using LokyBackend.

0 Answers0