0

A couple of questions related to best practices with ipyparallel. I'm attempting use it to implement a Monte Carlo framework for a model that takes ~15 to run. The idea is to run N engines (via SLURM) and have a "master" process that queues all the required tasks asynchronously and busy-waits for completion, updating a sqlite db with the status of each run.

I would like to know when a task has been assigned to an engine so I can track its status in my database. I tried using the AsyncResult instances to get the msg_id and query the task database, but the "started" field isn't updated until the task completes.

It seems there should be a way to receive this notification, or at least to query the hub while the engine is working.

Also, must I do something to avoid engine heartbeat timeout during a long-running task? Is that the purpose of client.spin_thread()?

Thanks!

Rich Plevin
  • 144
  • 1
  • 1
  • 3

1 Answers1

0

I've answered part of my own question, using publish_data. The idea is that instead of just calling my main "worker" function in each engine, I call publish_data() before and after the main worker function to set a status that the client can see. For example:

def wrapper(run_id, argDict):
    from ipyparallel.engine.datapub import publish_data

    publish_data({run_id : 'running'})
    status = runMonteCarloTrial(argDict)   # runs for ~15 minutes
    publish_data({run_id : status})
    return status

The "master" task calls:

ar = client.map_async(wrapper, listOfArgDicts) 

I then loop over ar until all AsyncResults are complete, examining ar.data to read the published data to identify running trials and saving trial results to a sqlite3 database.

This general approach is working for a simple test case. I've yet to explore the timeout question for long-running function calls.

Rich Plevin
  • 144
  • 1
  • 1
  • 3