0

I am trying to integrate ipyparallel as an alternative to the multiprocessing in my master - slaves architecture.

Namely, currently all processes have two queues:

  • One for tasks from master to slaves
  • One for results from slaves to master.

At the moment I use multiprocessing.Manager().Queue() queues for the communication. However it seems they cannot be shared with the ipyparallel processes.

The reason I do this at all (and not just via functions) is because "setting up" the workers from scratch is almost as expensive (computation-wise) as performing the calculation. I'd prefer to run one function (via map_async or similar) that would set up the environment on the workers, perform the first calculation, push the results to results queue, and then fetch (significantly smaller) updates from tasks queue and repeat the last couple of steps until stopped (again, via the queue).

If there is a better approach / framework for this sort of task (it has to be python, however), I'm all ears.

Thanks

1 Answers1

4

With IPython parallel, it is common to do "setup" with a DirectView, then distribute smaller tasks that depend on that setup as functions passed to a load-balanced view.

Set up your client and views:

import ipyparallel as ipp

rc = ipp.Client()
dview = rc[:]
lbview = rc.load_balanced_view()

Do your setup with the direct view:

dview.execute("data = setup()")

You can now rely on that in your tasks using ipp.Reference:

def task(data):
    analyze(data)

rdata = ipp.Reference('data')
ar = view.apply(task, rdata)
result = ar.get()

In this way, you can do the setup once everywhere, and then run tasks that depend on that setup in a load-balanced manner.

minrk
  • 37,545
  • 9
  • 92
  • 87