I wrote a python server that uses an OpenVino network to run inference on incoming requests. In order to speed things up, I receive requests in multiple threads, and I would like to run the inferences concurrently. It seems that whatever I do, the times I get are the same as non-concurrent solutions - which makes me think I've missed something.
I'm writing it in Python, using openvino 2019.1.144. I'm using multiple requests to the same plugin and network in order to try to make the inferences run concurrently.
def __init__(self, num_of_requests: int = 4):
self._plugin = IEPlugin("CPU", plugin_dirs=None)
model_path = './Det/'
model_xml = os.path.join(model_path, "ssh_graph.xml")
model_bin = os.path.join(model_path, "ssh_graph.bin")
net = IENetwork(model=model_xml, weights=model_bin)
self._input_blob = next(iter(net.inputs))
# Load network to the plugin
self._exec_net = self._plugin.load(network=net, num_requests=num_of_requests)
del net
def _async_runner(detect, images_subset, idx):
for img in images_subset:
request_handle = self._exec_net.start_async(request_id=idx, inputs={self._input_blob: img})
request_handle.wait()
def run_async(images): # These are the images to infer
det = Detector(num_of_requests=4)
multiplier = int(len(images)/4)
with ThreadPoolExecutor(4) as pool:
futures = []
for idx in range(0,3):
images_subset = images[idx*multiplier:(idx+1)*multiplier-1]
futures.append(pool.submit(_async_runner, det.detect, images_subset, idx))
When I run 800 inferences in sync mode, I get an avg. run time of 290ms When I run in async mode I get avg run time of 280ms. These are not substantial improvements. What am I doing wrong?