I want to train a lot of models in parallel using ipython parallel with LoadBalancedView.
However I want the constraint that after each task is done, that the particular node must "check" with another node (let's call it the task arranger) to make sure they should proceed, and if so, which task it should take.
This isn't really a DAG, this is just clients communicating with another task arranger node to control both the timing and ordering of task completion.
I also need to be able to ensure that nodes dropping out will have their tasks picked up by others if they fail.
How could I do this in iPython parallel?
EDIT: To clarify, I like the ability of iPython parallel to handle tasks, reporting results, socket communication, etc. But I essentially need the power to give out individual tasks to individual machines at a time of my choosing from a master process and process and add tasks as they are finished or given.
EDIT #2: Ah, perhaps I could lock the table, and then manually change the ordering of the tasks in this table (?). The lock achieves stopping clients from getting further tasks (they must wait), and I can change the order to whatever I like based on the tasks itself.