Performing long running compute intensive operation on background thread

Question

I have written a REST API in Python Tornado framework which predicts the answer of a question from a given paragraph.

Here is the Python code for the Tornado handler:

def post(self):
    """
    This function predicts the response from the pre-trained Allen model
    """    
    try:
        request_payload = tornado.escape.json_decode(self.request.body)

        if (request_payload is None):
            return self._return_response(self, { "message": "Invalid request!" }, 400)

        context = request_payload["context"]
        question = request_payload["question"]

        if(context is None or not context):
            return self._return_response(self, { "message": "Context is not provided!" }, 400)

        if(question is None or not question):
            return self._return_response(self, { "message": "Question is not provided!" }, 400)

        # Compute intensive operation which blocks the main thread
        answer_prediction = predictor.predict(passage=str(context), question=str(question))
        best_answer = answer_prediction["best_span_str"] or "Sorry, no answer found for your question!"

        return self._return_response(self, { "answer": best_answer }, 200)

    except KeyError:
        #Return bad request if any of the keys are missing
        return self._return_response(self, { "message": 'Some keys are missing from the request!' }, 400)

    except json.decoder.JSONDecodeError:
        return self._return_response(self, { "message": 'Cannot decode request body!' }, 400)

    except Exception as ex:
        return self._return_response(self, { "message": 'Could not complete the request because of some error at the server!', "cause": ex.args[0], "stack_trace": traceback.format_exc(sys.exc_info()) }, 500)

The problem is that the line:

answer_prediction = predictor.predict(passage=str(context), question=str(question))

Blocks the main thread for incoming requests and waits until that long running operation is completed whilst blocking for other requests and times out the current request sometimes.

I have read this answer detailing a solution with putting the long running operation in queue, but I am not getting it.

Also due to Python's GIL only one thread can run at the same time, which forces me to spawn a separate process to deal with it, since processes are costly, are there any viable solution to my problem and how to deal with this kind of situations.

Here are my questions:

How to offload safely compute intensive operations to a background thread
How to handle timeouts and exceptions gracefully
How to maintain a queue structure for checking if the long running operation has been completed or not.

Did you check Task Schedulers as [APScheduler](https://apscheduler.readthedocs.io/en/latest/index.html) — adnbsr, Mar 01 '19 at 12:26
If current task is heavy and you can add to queue and make concurrent operations. — adnbsr, Mar 01 '19 at 12:28

score 1 · Accepted Answer · answered Mar 02 '19 at 07:28

1

Run the blocking code in a separate thread. Use IOLoop.run_in_executor.

Example:

from functools import partial

async def post(self):
    ...

    # create a partial object with the keyword arguments
    predict_partial = partial(predictor.predict, passage=str(context), question=str(question))

    answer_prediction = await IOLoop.current().run_in_executor(None, predict_partial)

    ...

answered Mar 02 '19 at 07:28

xyres

20,487
3
56
85

So the source for [run_in_executor](https://www.tornadoweb.org/en/stable/_modules/tornado/ioloop.html#IOLoop.run_in_executor) says that `max_workers=(cpu_count() * 5)` what does it mean by that ? – Kunal Mukherjee Mar 03 '19 at 15:14
1

@KunalMukherjee That is the maximum number of threads. If you have a dual core cpu, this value would be equal to 10. That means your server will run at most 10 threads to compute `predictor.predict` for 10 separate requests simultaneously. If more requests come in, they will have to wait until a thread is free. Your server will not block even if all the threads are occupied. – xyres Mar 03 '19 at 16:45

score 0 · Answer 2 · answered Mar 01 '19 at 12:25

0

I think you should turn this API call to async call and return immediately to the caller with a token.

The token will use the token later on in order to check (using another API call) if the operation is completed.

answered Mar 01 '19 at 12:25

balderman

22,927
7
34
52

Performing long running compute intensive operation on background thread

2 Answers2