10

I have a main Python process, and a bunch or workers created by the main process using os.fork().

I need to pass large and fairly involved data structures from the workers back to the main process. What existing libraries would you recommend for that?

The data structures are a mix of lists, dictionaries, numpy arrays, custom classes (which I can tweak) and multi-layer combinations of the above.

Disk I/O should be avoided. If I could also avoid creating copies of the data -- for example by having some kind of shared-memory solution -- that would be nice too, but is not a hard constraint.

For the purposes of this question, it is mandatory that the workers are created using os.fork(), or a wrapper thereof that would clone the master process's address space.

This only needs to work on Linux.

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 1
    "Disk I/O should be avoided." I assume that doesn't include paging, in situations where the amount of data in memory grows large enough that paging would be needed. – JAB Jun 03 '11 at 13:32
  • 1
    @JAB: Your assumption is correct. However, the eventual solution should make judicious use of memory. – NPE Jun 03 '11 at 13:33

1 Answers1

4

multiprocessing's queue implementation works. Internally, it pickles data to a pipe.

q = multiprocessing.Queue()
if (os.fork() == 0):
    print(q.get())
else:
    q.put(5)
# outputs: 5
Spferical
  • 156
  • 4