8

Sometimes using a multiprocessing pool in combination with a manager and Python 3.4, lock.acquire() throws a strange TypeError: an integer is required (got type NoneType).

A few times this has come up on my Travis test suite and I am unable to figure out where it comes from and what it means. Even worse, I am unable to reproduce it reliably, it just happens or don't. Usually it doesn't but once every hundred runs or so it does :-(.

I am completely lost. Maybe someone has encountered something like this before and can give a hint on where to look for the bug's source. Let me start with the full traceback:

Traceback (most recent call last):
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/environment.py", line 150, in _single_run
    traj._store_final(store_data=store_data)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/trajectory.py", line 3271, in _store_final
    store_data=store_data)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 295, in store
    self.acquire_lock()
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 284, in acquire_lock
    self._lock.acquire()
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 958, in acquire
    return self._callmethod('acquire', args)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 731, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 413, in _send_bytes
    self._send(chunk)
  File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 369, in _send
    n = write(self._handle, buf)
TypeError: an integer is required (got type NoneType)

This only happens in python 3.4, but not 2.7 :-/.

My library producing the bug is rather comprehensive. However, basically what I do is the following:

import multiprocessing as mp


def my_job(object_with_lock):
    # do stuff in parallel
    returnvalue = 42  # has been computed in the parallel part

    object_with_lock.lock.acquire()
    # do stuff sequentially, file IO and so on
    object_with_lock.lock.release()
    return returnvalue


class MyClassWithLock(object):
    def __init__(self, lock):
        self.lock = lock


def main():
    manager = mp.Manager()
    lock = manager.Lock()
    my_object_with_lock = MyClassWithLock(lock)

    n_cores = 4
    pool = mp.Pool(n_cores)

    # Do the job concurrently:
    iterator = (my_object_with_lock for x in range(100))
    imap_results = pool.imap(my_job, iterator)

    pool.close()
    pool.join()
    del pool

    result_list = [x for x in imap_results]

    manager.shutdown()

    print(result_list)


if __name__ == '__main__':
    main()

This code executes fine (although haven't tested it 1000 times), but it basically does what I do in the library.

How can something like this produce the error from above? Why does lock.acquire() throw this mysterious TypeError occasionally?


EDIT: Using Python 3.4.2 DOES replicate the bug (only in my library) but 3.4.1 does not o.O

Moreover, trying twice seems to overcome the problem, but that does not feel right:

try:
    object_with_lock.lock.acquire()
except TypeError:
    object_with_lock.lock.acquire()

2nd EDIT: After using multiprocessing.log_to_stderr() [thanks do dano] I can recover the following log messages. Somewhere it looses connection due to:

[DEBUG/ForkPoolWorker-4] thread 'MainThread' has no more proxies so closing conn

But there's no error happening before, this jut comes out of the blue.

Moreover, directly before and after retrying to acquire the lock it says:

[Level 5/ForkPoolWorker-4] finalizer calling <function BaseProxy._decref at 0x7f5890307510> with args (Token(typeid='Lock', address='/tmp/pymp-huxl4h0k/listener-6mm0hc8b', id='7f58903128f0'), b'\x9aF7e\x02\xbc.\xb8\x87\xe0\x00?\xee\xf5\xd6J\x95@\x16\xb7s?\xbf\xe6\xa32a\x16\x13W(\xfb', None, <multiprocessing.util.ForkAwareLocal object at 0x7f58903621c8>, ProcessLocalSet(), <function Client at 0x7f5890375d90>) and kwargs {}
[DEBUG/ForkPoolWorker-4] DECREF '7f58903128f0'
ERROR:pypet.retry:Starting the next try, because I could not execute `acquire_lock` due to: an integer is required (got type NoneType)
[DEBUG/ForkPoolWorker-4] thread 'MainThread' does not own a connection
[DEBUG/ForkPoolWorker-4] making connection to manager
[DEBUG/SyncManager-1] starting server thread to service 'ForkPoolWorker-4'

And apparently the connection is re-established. Still I don't understand why the connection was lost in the first place.

SmCaterpillar
  • 6,683
  • 7
  • 42
  • 70
  • The error indicates sometimes the connection to the manager process is closed when you try to call `acquire` on the `Manager.Lock()`. It's hard to say why/how without a reproducible example. – dano Mar 26 '15 at 16:56
  • But why does retrying work in such a case? Is the connection re-established? – SmCaterpillar Mar 26 '15 at 16:58
  • Yeah, looks like it. You can add this at the beginning of your script to enable debug logging `multiprocessing`, which might help explain what's happening: `logger = mp.log_to_stderr() ; logger.setLevel(logging.INFO)` – dano Mar 26 '15 at 17:01
  • Well, yeah In my library I use logging, so I see that retrying actually works and the program reaches the except block. – SmCaterpillar Mar 26 '15 at 17:03
  • 1
    No, I mean the `multiprocessing` library itself has debug logging in it. If you turn it on, you'll see messages being logged by `multiprocessing` that might help explain what's happening internally. – dano Mar 26 '15 at 17:04
  • Were you able to solve this problem? Because I think I have the same problem, see my question https://stackoverflow.com/questions/46002812/multiprocessing-on-linux-works-with-spawn-only?noredirect=1#comment79107173_46002812 @SmCaterpillar – CodeNoob Sep 06 '17 at 10:59

0 Answers0