1

I am trying to get a score for a model through cross validation with sklearn.cross_validation.cross_val_score. According to its documentation, the parameter n_jobs sets the number of cpus that you can utilize. However, when I set it to -1 (or other values not equal to 1), the program complains that:

AttributeError: '_MainProcess' object has no attribute '_daemonic'

Attached below is a minimal example, and the corresponding error message.

import sklearn.datasets
import sklearn.cross_validation
import sklearn.linear_model
d = sklearn.datasets.load_iris()
X = d.data
y = d.target
sklearn.cross_validation.cross_val_score(sklearn.linear_model.LogisticRegression(), X, y, n_jobs=-1)

AttributeError                            
Traceback (most recent call last)
<ipython-input-57-3b5f62e97b0d> in <module>()
    ----> 1 sklearn.cross_validation.cross_val_score(gb_clf, train, train_label, n_jobs=2)

/usr/lib/python3.4/site-packages/sklearn/cross_validation.py in cross_val_score(estimator, X, y,     scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
   1150         delayed(_cross_val_score)(clone(estimator), X, y, scorer, train, test,
   1151                                   verbose, fit_params)
-> 1152         for train, test in cv)
   1153     return np.array(scores)
   1154 

/usr/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    468             self._pool = None
    469         else:
--> 470             if multiprocessing.current_process()._daemonic:
    471                 # Daemonic processes cannot have children
    472                 n_jobs = 1

AttributeError: '_MainProcess' object has no attribute '_daemonic'

Additional information: I am running this script in IPython notebook mode. It also appears under console mode, or under normal python interpreter (per @larsmans comment).

K.Chen
  • 1,166
  • 1
  • 11
  • 18
  • Solution to this problem: it is a problem due to older version (0.14.0) of sklearn. Upgrading to version 0.15.0b1 should be able to solve it. – K.Chen Jul 01 '14 at 19:14
  • For details, see https://github.com/scikit-learn/scikit-learn/issues/3323 – K.Chen Jul 01 '14 at 19:21

2 Answers2

3

The combination of IPython notebook, NumPy-heavy code (like scikit-learn) and joblib/multiprocessing (used when n_jobs != 1) is problematic and can cause all kinds of crashes, freezes and strange error messages. The NumPy/SciPy community is aware of this, but has AFAIK not yet diagnosed what exactly is going wrong, let alone produced a fix.(*) I advise you to run this code outside the IPython notebook.

(*) Be sure to search the mailing lists for the various projects if you're interested. The problem probably stems from IPython's use of ZeroMQ, a multithreaded C library, in conjunction with Python multiprocessing's habit of calling fork without exec in violation of POSIX. Similar problems occur when NumPy calls multithreaded linear algebra libraries in a multiprocessing context.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
1

You will have to protect your code:

if __name__ == "__main__":
    [Your code]

There seems to be issues with joblib.Parallel when it comes to multiple processing (n_jobs > 1). More information about that in the joblib documentation, and there's also a Github thread discussing that problem.

phd
  • 82,685
  • 13
  • 120
  • 165
xhlu
  • 11
  • 2