9

I've got this very peculiar hanging happening on my machine when using pytnon multiprocessing Pool with numpy and PySide imported. This is the most entangled bug I have seen in my life so far:) The following code:

import numpy as np
import PySide


def hang():
    import multiprocessing
    pool = multiprocessing.Pool(processes = 1)
    pool.map(f, [None])


def f(ignore):
    print('before dot..')
    np.dot(np.zeros((128, 1)), np.zeros((1, 32)))
    print('after dot.')


if __name__ == "__main__":
    hang()
    print('success!')

hangs printing only 'before dot..'. But it is supposed to print

before dot..
after dot.
success!

I'm not gdb expert, but looks like gdb shows that processes exits (or crashes) on 'np.dot' line:

[Inferior 1 (process 2884) exited normally]

There are several magical modifications I can do to prevent hanging:

  • if you decrease shape of arrays going into 'dot' (e.g. from 128 to 127)
  • (!) if you increase shape of arrays going into 'dot' from 128 to 256
  • if you do not use multiprocessing and just run function 'f'
  • (!!!) if you comment out PySide import which is not used anywhere in the code

Any help is appreciated!

Packages version:

numpy=1.8.1 or 1.7.1 PySide=1.2.1 or 1.2.2

Python version:

Python 2.7.5 (default, Sep 12 2013, 21:33:34) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin

or

Python 2.7.6 (default, Apr 9 2014, 11:48:52) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin

Notice: While hunting for a information, I simplified original code and question a bit. But here is a stack of updates to keep history for others who may encounter this bug (e.g. I started with matplotlib, not with pyside)

Update: I narrowed down pylab import to importing matplotlib with pyside backend and updated the code to run.

Update: I'm modifying the post to import only PySide only instead of:

import matplotlib
matplotlib.use('qt4agg')
matplotlib.rcParams['backend.qt4']='PySide'
import matplotlib.pyplot

Update: Initial statistics shows that it is a Mac-only issue. 3 people have it working on Ubuntu, 2 people got it hanging on Mac.

Update: print(os.getpid()) before dot operation gives me pid that I don't see in 'top' that apparently means that it crashes and multiprocessing waits for a dead process. For this reason I can not attach debugger to it. I edited main question accordingly.

otognan
  • 1,736
  • 3
  • 16
  • 20

3 Answers3

8

this is a general issue with some BLAS libraries used by numpy for dot.

Apple Accelerate and OpenBlas built with GNU Openmp are known to not be safe to use on both sides of a fork (the parent and the child process multiprocessing create). They will deadlock.

This cannot be fixed by numpy but there are three workarounds:

  • use netlib BLAS, ATLAS or git master OpenBlas based on pthreads (2.8.0 does not work)
  • use python 3.4 and its new multiprocessing spawn or forkserver start methods
  • use threading instead of multiprocessing, numpy releases the gil for most expensive operations so you can archive decent threading speedups on typical desktop machines
jtaylor
  • 2,389
  • 19
  • 19
  • Thank you! Threading is not an option - dot operation takes up only ~10% of the execution, so threading doesn't give me speed up I want. Python 3 is unfortunately also not an option - I'm working in a collaborative environment and we settled on python 2.7. So can someone point me how to change blas implementation for numpy? – otognan Jun 14 '14 at 19:02
  • Do you have any more information about the problem, any references? I have a very similar hang, but not in `numpy.dot`, just in `numpy.zeros` (allocating 2MB or so). I also have multiple threads and a child process via multiprocessing. Python 2.7 with Theano. – Albert Mar 24 '15 at 12:11
  • I couldn't find it in the NumPy issue tracker, so I opened a new one. The consensus is pretty much what jtaylor says above. https://github.com/numpy/numpy/issues/5752 – josePhoenix Apr 06 '15 at 21:55
0

I believe this to be an issue with the multiprocessing module.

Try using the following instead.

import numpy as np
import PySide


    def hang():
        import multiprocessing.dummy as multiprocessing
        pool = multiprocessing.Pool(processes = 1)
        pool.map(f, [None])


    def f(ignore):
        print('before dot..')
        np.dot(np.zeros((128, 1)), np.zeros((1, 32)))
        print('after dot.')


    if __name__ == "__main__":
        hang()
        print('success!')
PsyKzz
  • 740
  • 4
  • 14
  • Yes it is issue with combining multiprocessing, pyside and blas numpy dot. If you remove one component, it doesn't hang (and your code doesn't hang). However its not clear how can I proceed, I really want multiprocessing:) – otognan Jun 14 '14 at 18:59
  • just use multiprocessing.dummy it uses threads instead of processes to multithread. – PsyKzz Jun 16 '14 at 08:59
  • it doesn't avoid GIL right? I tried threading, it is too slow for my purposes. – otognan Jun 18 '14 at 06:33
0

I ran into this exact problem. There was a deadlock when the child process used numpy.dot. But it ran when I reduced the size of the matrix. So instead of a dot product on a matrix with 156000 floats, I performed 3 dot products of 52000 each and concatenated the result. I'm not sure of what the max limit is and whether it depends on the number of child processes, available memory or any other factors. But if the largest matrix that does not deadlock can be identified by trial and error, then the following code should help.

def get_batch(X, update_iter, batchsize):
    curr_ptr = update_iter*batchsize
    if X.shape[0] - curr_ptr <= batchsize :
        X_i = X[curr_ptr:, :]
    else:
        X_i = X[curr_ptr:curr_ptr+batchsize, :]
    return X_i

def batch_dot(X, w, batchsize):
    y = np.zeros((1,))
    num_batches = X.shape[0]/batchsize
    if X.shape[0]%batchsize != 0:
        num_batches += 1
    for batch_iter in range(0, num_batches):
        X_batch = get_batch(X, batch_iter, batchsize)
        y_batch = X_batch.dot(w)
        y = np.hstack((y, y_batch))
    return y[1:]
s2o_ozman
  • 71
  • 5