sklearn.cross_validation.cross_val_score multiple cpu?

Question

I am trying to get a score for a model through cross validation with sklearn.cross_validation.cross_val_score. According to its documentation, the parameter n_jobs sets the number of cpus that you can utilize. However, when I set it to -1 (or other values not equal to 1), the program complains that:

AttributeError: '_MainProcess' object has no attribute '_daemonic'

Attached below is a minimal example, and the corresponding error message.

import sklearn.datasets
import sklearn.cross_validation
import sklearn.linear_model
d = sklearn.datasets.load_iris()
X = d.data
y = d.target
sklearn.cross_validation.cross_val_score(sklearn.linear_model.LogisticRegression(), X, y, n_jobs=-1)

AttributeError                            
Traceback (most recent call last)
<ipython-input-57-3b5f62e97b0d> in <module>()
    ----> 1 sklearn.cross_validation.cross_val_score(gb_clf, train, train_label, n_jobs=2)

/usr/lib/python3.4/site-packages/sklearn/cross_validation.py in cross_val_score(estimator, X, y,     scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
   1150         delayed(_cross_val_score)(clone(estimator), X, y, scorer, train, test,
   1151                                   verbose, fit_params)
-> 1152         for train, test in cv)
   1153     return np.array(scores)
   1154 

/usr/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    468             self._pool = None
    469         else:
--> 470             if multiprocessing.current_process()._daemonic:
    471                 # Daemonic processes cannot have children
    472                 n_jobs = 1

AttributeError: '_MainProcess' object has no attribute '_daemonic'

Additional information: I am running this script in IPython notebook mode. It also appears under console mode, or under normal python interpreter (per @larsmans comment).

Solution to this problem： it is a problem due to older version (0.14.0) of sklearn. Upgrading to version 0.15.0b1 should be able to solve it. — K.Chen, Jul 01 '14 at 19:14
For details, see https://github.com/scikit-learn/scikit-learn/issues/3323 — K.Chen, Jul 01 '14 at 19:21

Fred Foo · Answer 1 · 2014-06-30T11:12:41.653

3

The combination of IPython notebook, NumPy-heavy code (like scikit-learn) and joblib/multiprocessing (used when n_jobs != 1) is problematic and can cause all kinds of crashes, freezes and strange error messages. The NumPy/SciPy community is aware of this, but has AFAIK not yet diagnosed what exactly is going wrong, let alone produced a fix.(*) I advise you to run this code outside the IPython notebook.

(*) Be sure to search the mailing lists for the various projects if you're interested. The problem probably stems from IPython's use of ZeroMQ, a multithreaded C library, in conjunction with Python multiprocessing's habit of calling fork without exec in violation of POSIX. Similar problems occur when NumPy calls multithreaded linear algebra libraries in a multiprocessing context.

edited Jun 30 '14 at 11:12

answered Jun 30 '14 at 11:06

Fred Foo

355,277
75
744
836

Dear @larsmans, thank you for your input on compatibility issues between ipython and multiprocessing. I have tested the script under normal python interpreter (i.e. python script.py). Still I get the same error message. – K.Chen Jun 30 '14 at 17:14
@K.Chen Which Python version? – Fred Foo Jul 01 '14 at 08:53
@K.Chen And which Python 3 in particular? multiprocessing got overhauled after 3.3. – Fred Foo Jul 02 '14 at 08:18
I am using Python 3.4.1 – K.Chen Jul 02 '14 at 17:29

score 1 · Answer 2 · edited Jul 06 '17 at 20:40

1

You will have to protect your code:

if __name__ == "__main__":
    [Your code]

There seems to be issues with joblib.Parallel when it comes to multiple processing (n_jobs > 1). More information about that in the joblib documentation, and there's also a Github thread discussing that problem.

edited Jul 06 '17 at 20:40

phd

82,685
13
120
165

answered Jul 06 '17 at 20:21

xhlu

11
2

sklearn.cross_validation.cross_val_score multiple cpu?

2 Answers2