5

When using multiprocessing Manager objects to create a server and connect remotely to that server, the client needs to maintain a connection to the remote server. If the server goes away before the client shuts down, the client will try to connect to the expected address of the server forever.

I'm running into a deadlock on client code trying to exit after the server has gone away, as my client process never exits.

If I del my remote objects and my clients manager before the server goes down, the process will exit normally, but deleting my client's manager object and remote objects immediately after use is less than ideal.

Is that the best I can do? Is there another (more proper) way to disconnect from a remote manager object? Is there a way to cleanly exit a client after a server has gone down and/or the connection is lost?

I know socket.setdefaulttimeout doesn't work with multiprocessing, but is there a way to set a connection timeout for the multiprocessing module specifically? This is the code I'm having trouble with:

from multiprocessing.managers import BaseManager
m = BaseManager(address=('my.remote.server.dns', 50000), authkey='mykey')
# this next line hangs forever if my server is not running or gets disconnected
m.connect()

UPDATE This is broken in multiprocessing. The connection timeout needs to happen at the socket level (and the socket needs to be non-blocking in order to do this) but non-blocking sockets break multiprocessing. It is not possible to handle giving up on making a connection if the remote server is not available.

underrun
  • 6,713
  • 2
  • 41
  • 53

2 Answers2

1

is there a way to set a connection timeout for the multiprocessing module specifically?

Yes, but this is a hack. It is my hope that someone with greater python-fu can improve this answer. The timeout for multiprocessing is defined in multiprocessing/connection.py:

# A very generous timeout when it comes to local connections...
CONNECTION_TIMEOUT = 20.
...
def _init_timeout(timeout=CONNECTION_TIMEOUT):
        return time.time() + timeout

Specifically, the way I was able to make it work was by monkey-patching the _init_timeout method as follows:

import sys
import time

from multiprocessing import managers, connection

def _new_init_timeout():
    return time.time() + 5

sys.modules['multiprocessing'].__dict__['managers'].__dict__['connection']._init_timeout = _new_init_timeout
from multiprocessing.managers import BaseManager
m = BaseManager(address=('somehost', 50000), authkey='secret')
m.connect()

Where 5 is the new timeout value. If there's an easier way, I'm sure someone will point it out. If not, this might be a candidate for a feature request to the multiprocessing dev team. I think something as elementary as setting a timeout should be easier than this. On the other hand, they may have philosophical reasons for not exposing timeout in the API.

Hope that helps.

  • Trouble here is that multiprocessing doesn't properly time out on establishing a socket connection at all. Yes, the module has a time out set, but unless there is some sort of connection EXCEPTION (busy, etc.) the call to s.connect() in the SocketClient function in the connection.py module will block FOREVER, regardless of what _init_timeout returns. – underrun May 09 '12 at 00:29
  • It might be useful for you to provide an example of implementing this code, and pointing out specifically what about it doesn't work for you. I'm having a hard time understanding your comment in the context of your original question. –  May 09 '12 at 02:08
  • using either your code or mine, the connect() method on BaseManager will block forever if it can't connect to the remote socket (if the service is unavailable on the remote machine for instance). Changing _init_timeout or CONNECTION_TIMEOUT won't actually cause a timeout to happen in the socket module -- it will only handle the case in the multiprocessing module where an exception has been raised before the timeout during the call to the connect() method of a socket object and either retry or fail. if the socket blocks forever multiprocessing can't do anything about it. – underrun May 09 '12 at 03:24
  • Did you actually try my code? Changing the 5 to a 1 in the code I posted changes the timeout. It raises `socket.error connection refused` if it can't connect to the remote manager within the specified time. Works for me. –  Apr 08 '13 at 15:35
  • the actual multiprocessing code that does a 20 second timeout does not time out after 20 seconds... neither your code nor python's stdlib code actually works to cause a connection to time out if the remote machine has accepted the socket connection but the remote service has not responded. – underrun Apr 08 '13 at 17:18
  • "if the remote machine has accepted the socket connection but the remote service has not responded" your question says "# this next line hangs forever if my server is not running or gets disconnected". My code solves (at least) the first part of the disjunctive. As I said previously, please provide a simple example of what you think is broken. –  Apr 08 '13 at 21:09
0

Can this help you ?

#### TEST_JOIN_TIMEOUT

def join_timeout_func():
    print '\tchild sleeping'
    time.sleep(5.5)
    print '\n\tchild terminating'

def test_join_timeout():
    p = multiprocessing.Process(target=join_timeout_func)
    p.start()

    print 'waiting for process to finish'

    while 1:
        p.join(timeout=1)
        if not p.is_alive():
            break
        print '.',
        sys.stdout.flush()

(Taken from python 16.6 page)

Usually, timeouts are tested in some while loop.

Louis
  • 2,854
  • 2
  • 19
  • 24
  • i wish that would work ... unfortunately multiprocessing.managers.BaseManager.connect() does not have a timeout parameter ... – underrun Sep 06 '11 at 15:29