11

This code works fine on Linux, but fails under Windows (which is expected). I know that the multiprocessing module uses fork() to spawn a new process and the file descriptors owned by the parent (i.e. the opened socket) are therefore inherited by the child. However, it was my understanding that the only type of data you can send via multiprocessing needs to be pickleable. On Windows and Linux, the socket object is not pickleable.

from socket import socket, AF_INET, SOCK_STREAM
import multiprocessing as mp
import pickle

sock = socket(AF_INET, SOCK_STREAM)
sock.connect(("www.python.org", 80))
sock.sendall(b"GET / HTTP/1.1\r\nHost: www.python.org\r\n\r\n")

try:
    pickle.dumps(sock)
except TypeError:
    print("sock is not pickleable")

def foo(obj):
    print("Received: {}".format(type(obj)))
    data, done = [], False
    while not done:
        tmp = obj.recv(1024)
        done = len(tmp) < 1024
        data.append(tmp)
    data = b"".join(data)
    print(data.decode())


proc = mp.Process(target=foo, args=(sock,))
proc.start()
proc.join()

My question is why can a socket object, a demonstrably non-pickleable object, be passed in with multiprocessing? Does it not use pickle as Windows does?

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
Goodies
  • 4,439
  • 3
  • 31
  • 57
  • 4
    The file descriptors aren't "sent", they're just there. – Ignacio Vazquez-Abrams Dec 05 '16 at 08:31
  • What do you mean? I could be (and must be partially) wrong, but I thought anything in the "args" parameter needs to be pickleable. Is this not the case? – Goodies Dec 05 '16 at 08:31
  • 1
    Arguments aren't pickled, they're just passed to the function in the subprocess. Only objects transmitted *between* processes need to be pickled. – Ignacio Vazquez-Abrams Dec 05 '16 at 08:37
  • And who determines what that is? The OS presumably. So the actual socket object itself is not sent between processes, but the underlying file descriptor is maintained? I understand that it's inherited by the child, but I thought the socket object was passed to the child, too, not just the simple descriptor. – Goodies Dec 05 '16 at 08:40
  • 6
    No, the socket object, as with all other parts of the process's memory map, *already exists* within the child since forking produces an *almost exact* copy of the parent. – Ignacio Vazquez-Abrams Dec 05 '16 at 08:41
  • This didn't quite explain it because there's an even simpler version of the code that does not even connect the socket. It merely creates a socket and passes the object to `foo()`. No send/recv or anything of the sort. On Linux, it works fine. On Windows, it does not and I saw a pickling error in the traceback. However this is an issue caused by the multiprocessing module importing the current module and simply requires me to check `if __name__ == "__main__"` to prevent recursion. Then it works. – Goodies Dec 05 '16 at 08:50

2 Answers2

7

On unix platforms sockets and other file descriptors can be sent to a different process using unix domain (AF_UNIX) sockets, so sockets can be pickled in the context of multiprocessing.

The multiprocessing module uses a special pickler instance instead of a regular pickler, ForkingPickler, to pickle sockets and file descriptors which then can be unpickled in a different process. It's only possible to do this because it is known where the pickled instance will be unpickled, it wouldn't make sense to pickle a socket or file descriptor and send it between machine boundaries.

For windows there are similar mechanisms for open file handles.

mata
  • 67,110
  • 10
  • 163
  • 162
2

I think the issue is that multiprocessing uses a different pickler for Windows and non-Windows systems. On Windows, there is no real fork(), and the pickling that is done is equivalent to pickling across machine boundaries (i.e. distributed computing). On non-Windows systems, objects (like file descriptors) can be shared across process boundaries. Thus, pickling on Windows systems (with pickle) is more limited.

The multiprocessing package does use copy_reg to register a few object types to pickle, and one of those types is a socket. However, the serialization of the socket object that is used on Windows is more limited due to the Windows pickler being weaker.

On a related note, if you do want to send a socket object with multiprocessing on Windows, you can… you just have to use the package multiprocess, which uses dill instead of pickle. dill has a better serializer that can pickle socket objects on any OS, and thus sending the socket object with multiprocess works in either case.

dill has the function copy; essentially loads(dumps(object)) -- which is useful for checking an object can be serialized. dill also has check, which performs copy but with the more restrictive "Windows" style fork-like operation. This allows users on non-Windows systems to emulate a copy on a Windows system, or across distributed resources.

>>> import dill
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.connect(('www.python.org', 80))
>>> s.sendall(b'GET / HTTP/1.1\rnHost: www.python.org\r\n\r\n')
>>> 
>>> dill.copy(s)
<socket._socketobject object at 0x10e55b9f0>
>>> dill.check(s)
<socket._socketobject object at 0x1059628a0>
>>> 

In short, the difference is caused by the pickler that multiprocessing uses on Windows being different than the pickler it uses on non-Windows systems. However, it is possible (and easy) to have work on any OS by using a better serializer (as is used in multiprocess).

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139