0

I have a server running IP controller and 12 IPengines. I connect to the controller from my laptop using SSH. I submitted some jobs to the controller using the load-balanced view interface (in non-blocking mode) and stored the message IDs in the Asyc Result object returned the by apply_async() method.

I accidentally lost the message IDs for the jobs and wanted to know if there's a way to retrieve the job IDs (or the results) from the Hub database. I use a SQLite database for the Hub, and I can get the rc.db_query() method to work, but I don't know what to look for.

Does anyone know how to query the Hub database only for message IDs of the jobs I submitted? What's the easiest way of retrieving the job results from the Hub, if I don't have access to the AsyncHubResult object (or their message IDs)?

Thanks!

KartMan
  • 369
  • 3
  • 19

1 Answers1

1

Without the message IDs, you are might have a pretty hard time finding the right tasks, unless there haven't been so many tasks submitted.

The querying is based on MongoDB (it's a passthrough when you use mongodb, and a subset of simple operators are implemented for sqlite).

Quick summary: a query is a dict. If you use literal values, they are equality tests, but you can use dict values for comparison operators.

You can search by date for any of the timestamps:

  • submitted: arrived at the controller
  • started: arrived on an engine
  • completed: finished on the engine

For instance, to find tasks submitted yesterday:

from datetime import date, time, timedelta, datetime
# round to midnight
today = datetime.combine(date.today(), time())
yesterday = today - timedelta(days=1)

rc.db_query({'submitted': {
  '$lt': today, # less than midnight last night
  '$gt': yesterday, # greater than midnight the night before
}})

or all tasks submitted 1-4 hours ago:

found = rc.db_query({'submitted': {
  '$lt': datetime.now() - timedelta(hours=1),
  '$gt': datetime.now() - timedelta(hours=4),
}})

With the results of that, you can look at keys like client_uuid to retrieve all messages submitted by a given client instance (e.g. a single notebook or script):

client_id = found[0]['client_uuid']
all_from_client = rc.db_query({'client_uuid': client_uuid})

Since you are only interested in results at this point, you can specify keys=['msg_id'] to only retrieve the message IDs. We can then use these msg_ids to get all the results produced by a single client session:

# construct list of msg_ids
msg_ids = [ r['msg_id'] for r in rc.db_query({'client_uuid': client_uuid}, keys=['msg_id']) ]
# use client.get_result to retrieve the actual results:
results = rc.get_result(msg_ids)

At this point, you have all of the results, but you have lost the association of which results came from which execution. There isn't a lot of info to help you out there, but you might be able to tell by type, timestamps, or perhaps select the 9 final items from a given session.

minrk
  • 37,545
  • 9
  • 92
  • 87
  • Thanks! This worked perfectly fine for what I was doing. I only submitted one set of jobs, so there was no issue in retrieving the results. – KartMan May 31 '16 at 19:35