How do I do async inference on OpenVino

Question

I wrote a python server that uses an OpenVino network to run inference on incoming requests. In order to speed things up, I receive requests in multiple threads, and I would like to run the inferences concurrently. It seems that whatever I do, the times I get are the same as non-concurrent solutions - which makes me think I've missed something.

I'm writing it in Python, using openvino 2019.1.144. I'm using multiple requests to the same plugin and network in order to try to make the inferences run concurrently.

def __init__(self, num_of_requests: int = 4):
   self._plugin = IEPlugin("CPU", plugin_dirs=None)
   model_path = './Det/'
   model_xml = os.path.join(model_path, "ssh_graph.xml")
   model_bin = os.path.join(model_path, "ssh_graph.bin")
   net = IENetwork(model=model_xml, weights=model_bin)
   self._input_blob = next(iter(net.inputs))

   # Load network to the plugin
   self._exec_net = self._plugin.load(network=net, num_requests=num_of_requests)
   del net

def _async_runner(detect, images_subset, idx):
    for img in images_subset:
        request_handle = self._exec_net.start_async(request_id=idx, inputs={self._input_blob: img})
        request_handle.wait()


def run_async(images):  # These are the images to infer
    det = Detector(num_of_requests=4)
    multiplier = int(len(images)/4)
    with ThreadPoolExecutor(4) as pool:
        futures = []
        for idx in range(0,3):
            images_subset = images[idx*multiplier:(idx+1)*multiplier-1]
            futures.append(pool.submit(_async_runner, det.detect, images_subset, idx))

When I run 800 inferences in sync mode, I get an avg. run time of 290ms When I run in async mode I get avg run time of 280ms. These are not substantial improvements. What am I doing wrong?

score 2 · Accepted Answer · answered Jul 25 '19 at 08:17

If you use wait(), the execution thread blocks until the result is available. If you want to use a truly async mode, you will need wait(0) which does not block the execution. Just launch the inference whenever you need and store the request_id. Then, you can check if the results are available checking if the returned value of wait(0) is 0. Be careful not to use the same request_id while the IE is doing the inference, that will cause a collision and it'll raise an exception.

However, in the code you provided, you cannot do this, because you are creating a thread pool in wich each thread executes inference of the image subset into a unique request_id. In fact, this is a parallel execution wich will give you a pretty fine performance, but it isn't "async" mode.

A truly async mode would be something like this:

while still_items_to_infer():
    get_item_to_infer()
    get_unused_request_id()
    launch_infer()
    do_someting()
    if results_available():
        get_inference_results()
        free_request_id()
        #This may be in a new thread
        process_inference_results()

This way, you are dispatching continuous inferences while waiting for them to finish.

Sorry, to another question. I have tried the async inference and integrated it with flask api for web request but it failed due `async_lock` after more than single request. Have you have any idea what is best way to integrate with API? — Pankaj, May 14 '20 at 10:21

score 1 · Answer 2 · answered Jun 26 '19 at 07:03

1

You can refer to a sample code from C:\Program Files (x86)\IntelSWTools\openvino_2019.1.144\inference_engine\samples\python_samples\object_detection_demo_ssd_async\object_detection_demo_ssd_async.py or similar samples from the python_samples directory to check the way async mode is addressed.

answered Jun 26 '19 at 07:03

Sandhiya - Intel

290
2
9

the problem with the sample files is that they don't really perform async inference, as far as I understand. They use the async method, and immediately after that perform wait(). This means that all inference are performed **sequentially**. Am I missing something? – Yoav Moran Jul 21 '19 at 08:36
@YoavMoran They make a new request, but wait for the old one. Pay attention to request ids. – mentalmushroom Apr 25 '20 at 17:30

How do I do async inference on OpenVino

2 Answers2