Sometimes using a multiprocessing pool in combination with a manager and Python 3.4, lock.acquire()
throws a strange TypeError: an integer is required (got type NoneType)
.
A few times this has come up on my Travis test suite and I am unable to figure out where it comes from and what it means. Even worse, I am unable to reproduce it reliably, it just happens or don't. Usually it doesn't but once every hundred runs or so it does :-(.
I am completely lost. Maybe someone has encountered something like this before and can give a hint on where to look for the bug's source. Let me start with the full traceback:
Traceback (most recent call last):
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/environment.py", line 150, in _single_run
traj._store_final(store_data=store_data)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/trajectory.py", line 3271, in _store_final
store_data=store_data)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 295, in store
self.acquire_lock()
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/site-packages/pypet-0.1b10.dev0-py3.4.egg/pypet/storageservice.py", line 284, in acquire_lock
self._lock.acquire()
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 958, in acquire
return self._callmethod('acquire', args)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/managers.py", line 731, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 413, in _send_bytes
self._send(chunk)
File "/home/travis/miniconda/envs/test-environment/lib/python3.4/multiprocessing/connection.py", line 369, in _send
n = write(self._handle, buf)
TypeError: an integer is required (got type NoneType)
This only happens in python 3.4, but not 2.7 :-/.
My library producing the bug is rather comprehensive. However, basically what I do is the following:
import multiprocessing as mp
def my_job(object_with_lock):
# do stuff in parallel
returnvalue = 42 # has been computed in the parallel part
object_with_lock.lock.acquire()
# do stuff sequentially, file IO and so on
object_with_lock.lock.release()
return returnvalue
class MyClassWithLock(object):
def __init__(self, lock):
self.lock = lock
def main():
manager = mp.Manager()
lock = manager.Lock()
my_object_with_lock = MyClassWithLock(lock)
n_cores = 4
pool = mp.Pool(n_cores)
# Do the job concurrently:
iterator = (my_object_with_lock for x in range(100))
imap_results = pool.imap(my_job, iterator)
pool.close()
pool.join()
del pool
result_list = [x for x in imap_results]
manager.shutdown()
print(result_list)
if __name__ == '__main__':
main()
This code executes fine (although haven't tested it 1000 times), but it basically does what I do in the library.
How can something like this produce the error from above? Why does lock.acquire()
throw this mysterious TypeError
occasionally?
EDIT: Using Python 3.4.2 DOES replicate the bug (only in my library) but 3.4.1 does not o.O
Moreover, trying twice seems to overcome the problem, but that does not feel right:
try:
object_with_lock.lock.acquire()
except TypeError:
object_with_lock.lock.acquire()
2nd EDIT: After using multiprocessing.log_to_stderr()
[thanks do dano] I can recover the following log messages.
Somewhere it looses connection due to:
[DEBUG/ForkPoolWorker-4] thread 'MainThread' has no more proxies so closing conn
But there's no error happening before, this jut comes out of the blue.
Moreover, directly before and after retrying to acquire the lock it says:
[Level 5/ForkPoolWorker-4] finalizer calling <function BaseProxy._decref at 0x7f5890307510> with args (Token(typeid='Lock', address='/tmp/pymp-huxl4h0k/listener-6mm0hc8b', id='7f58903128f0'), b'\x9aF7e\x02\xbc.\xb8\x87\xe0\x00?\xee\xf5\xd6J\x95@\x16\xb7s?\xbf\xe6\xa32a\x16\x13W(\xfb', None, <multiprocessing.util.ForkAwareLocal object at 0x7f58903621c8>, ProcessLocalSet(), <function Client at 0x7f5890375d90>) and kwargs {}
[DEBUG/ForkPoolWorker-4] DECREF '7f58903128f0'
ERROR:pypet.retry:Starting the next try, because I could not execute `acquire_lock` due to: an integer is required (got type NoneType)
[DEBUG/ForkPoolWorker-4] thread 'MainThread' does not own a connection
[DEBUG/ForkPoolWorker-4] making connection to manager
[DEBUG/SyncManager-1] starting server thread to service 'ForkPoolWorker-4'
And apparently the connection is re-established. Still I don't understand why the connection was lost in the first place.