python child process crashes on numpy dot if PySide is imported

Question

I've got this very peculiar hanging happening on my machine when using pytnon multiprocessing Pool with numpy and PySide imported. This is the most entangled bug I have seen in my life so far:) The following code:

import numpy as np
import PySide


def hang():
    import multiprocessing
    pool = multiprocessing.Pool(processes = 1)
    pool.map(f, [None])


def f(ignore):
    print('before dot..')
    np.dot(np.zeros((128, 1)), np.zeros((1, 32)))
    print('after dot.')


if __name__ == "__main__":
    hang()
    print('success!')

hangs printing only 'before dot..'. But it is supposed to print

before dot..
after dot.
success!

I'm not gdb expert, but looks like gdb shows that processes exits (or crashes) on 'np.dot' line:

[Inferior 1 (process 2884) exited normally]

There are several magical modifications I can do to prevent hanging:

if you decrease shape of arrays going into 'dot' (e.g. from 128 to 127)
(!) if you increase shape of arrays going into 'dot' from 128 to 256
if you do not use multiprocessing and just run function 'f'
(!!!) if you comment out PySide import which is not used anywhere in the code

Any help is appreciated!

Packages version:

numpy=1.8.1 or 1.7.1 PySide=1.2.1 or 1.2.2

Python version:

Python 2.7.5 (default, Sep 12 2013, 21:33:34) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin

or

Python 2.7.6 (default, Apr 9 2014, 11:48:52) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin

Notice: While hunting for a information, I simplified original code and question a bit. But here is a stack of updates to keep history for others who may encounter this bug (e.g. I started with matplotlib, not with pyside)

Update: I narrowed down pylab import to importing matplotlib with pyside backend and updated the code to run.

Update: I'm modifying the post to import only PySide only instead of:

import matplotlib
matplotlib.use('qt4agg')
matplotlib.rcParams['backend.qt4']='PySide'
import matplotlib.pyplot

Update: Initial statistics shows that it is a Mac-only issue. 3 people have it working on Ubuntu, 2 people got it hanging on Mac.

Update: print(os.getpid()) before dot operation gives me pid that I don't see in 'top' that apparently means that it crashes and multiprocessing waits for a dead process. For this reason I can not attach debugger to it. I edited main question accordingly.

importing import matplotlib.pyplot instead of pylab doesn't help — otognan, May 30 '14 at 23:02
problem doesn't appear if matplotlib backend is not a 'qt4agg'.. so it looks like even qt is involved here.. — otognan, May 30 '14 at 23:22
Also unable to reproduce. How about using a debugger to see where it hangs? — otus, Jun 01 '14 at 18:51
It does work on my Ubuntu 12.04. One of my friends got it hanging on his mac as well and another got it working on ubuntu - therefore I suspect its a mac only issue. — otognan, Jun 02 '14 at 03:28
You can check whether importing PySide starts a thread in the background. Threads and multiprocessing don't always play nicely together. — Fred Foo, Jun 06 '14 at 20:55
you will only be able to reproduce it on macs or linux with certain openblas builds. Those libraries are at fault here. — jtaylor, Jun 13 '14 at 18:28

jtaylor · Answer 1 · 2014-06-13T18:27:21.373

8

this is a general issue with some BLAS libraries used by numpy for dot.

Apple Accelerate and OpenBlas built with GNU Openmp are known to not be safe to use on both sides of a fork (the parent and the child process multiprocessing create). They will deadlock.

This cannot be fixed by numpy but there are three workarounds:

use netlib BLAS, ATLAS or git master OpenBlas based on pthreads (2.8.0 does not work)
use python 3.4 and its new multiprocessing spawn or forkserver start methods
use threading instead of multiprocessing, numpy releases the gil for most expensive operations so you can archive decent threading speedups on typical desktop machines

edited Jun 13 '14 at 18:27

answered Jun 13 '14 at 18:21

jtaylor

2,389
19
19

Thank you! Threading is not an option - dot operation takes up only ~10% of the execution, so threading doesn't give me speed up I want. Python 3 is unfortunately also not an option - I'm working in a collaborative environment and we settled on python 2.7. So can someone point me how to change blas implementation for numpy? – otognan Jun 14 '14 at 19:02
Do you have any more information about the problem, any references? I have a very similar hang, but not in `numpy.dot`, just in `numpy.zeros` (allocating 2MB or so). I also have multiple threads and a child process via multiprocessing. Python 2.7 with Theano. – Albert Mar 24 '15 at 12:11
I couldn't find it in the NumPy issue tracker, so I opened a new one. The consensus is pretty much what jtaylor says above. https://github.com/numpy/numpy/issues/5752 – josePhoenix Apr 06 '15 at 21:55

score 0 · Answer 2 · answered Jun 13 '14 at 16:50

0

I believe this to be an issue with the multiprocessing module.

Try using the following instead.

import numpy as np
import PySide


    def hang():
        import multiprocessing.dummy as multiprocessing
        pool = multiprocessing.Pool(processes = 1)
        pool.map(f, [None])


    def f(ignore):
        print('before dot..')
        np.dot(np.zeros((128, 1)), np.zeros((1, 32)))
        print('after dot.')


    if __name__ == "__main__":
        hang()
        print('success!')

answered Jun 13 '14 at 16:50

PsyKzz

740
4
14

Yes it is issue with combining multiprocessing, pyside and blas numpy dot. If you remove one component, it doesn't hang (and your code doesn't hang). However its not clear how can I proceed, I really want multiprocessing:) – otognan Jun 14 '14 at 18:59
just use multiprocessing.dummy it uses threads instead of processes to multithread. – PsyKzz Jun 16 '14 at 08:59
it doesn't avoid GIL right? I tried threading, it is too slow for my purposes. – otognan Jun 18 '14 at 06:33

score 0 · Answer 3 · answered Aug 21 '17 at 16:17

I ran into this exact problem. There was a deadlock when the child process used numpy.dot. But it ran when I reduced the size of the matrix. So instead of a dot product on a matrix with 156000 floats, I performed 3 dot products of 52000 each and concatenated the result. I'm not sure of what the max limit is and whether it depends on the number of child processes, available memory or any other factors. But if the largest matrix that does not deadlock can be identified by trial and error, then the following code should help.

def get_batch(X, update_iter, batchsize):
    curr_ptr = update_iter*batchsize
    if X.shape[0] - curr_ptr <= batchsize :
        X_i = X[curr_ptr:, :]
    else:
        X_i = X[curr_ptr:curr_ptr+batchsize, :]
    return X_i

def batch_dot(X, w, batchsize):
    y = np.zeros((1,))
    num_batches = X.shape[0]/batchsize
    if X.shape[0]%batchsize != 0:
        num_batches += 1
    for batch_iter in range(0, num_batches):
        X_batch = get_batch(X, batch_iter, batchsize)
        y_batch = X_batch.dot(w)
        y = np.hstack((y, y_batch))
    return y[1:]

python child process crashes on numpy dot if PySide is imported

3 Answers3