As stated in the title I'm trying to parallelize the prediction of multiple (sequential) keras models.
The challenge is that I have ~200 small models (each for a different signal, each trained individually - so combining them to a larger model is off the table). These models are to be used for predictions with a relatively fast cycle-time.
Currently they run in a for-loop like so:
start_time = perf_counter()
for model in modelList:
input = getModelInput() # example dummy for extracting model-specific input data
prediction = model(input, training=False).numpy()
end_time = perf_counter()
print(f'Prediction loop execution: {(end_time - start_time) * 1000:.2f} ms')
With this code the whole execution takes around 1.000 ms.
After researching the topic (including multiprocessing which is used in other parts of the project) I tried to use a multiprocessing.Pool. Unfortunately this lead to other issues ...
Trying to pass the model with a multiprocessing.Manager like
# simplified code example
manager = multiprocessing.Manager()
modelList = manager.list()
modelList.append(model)
lead to the terminal output
INFO:tensorflow:Assets written to: ram://4893b60f-addb-42cc-8f73-ecd08f4345f1/assets
and took forever ...
Next I tried a workaround I have found to load the model within the process
def poolPrediction(modelPath: str):
model = load_model(modelPath, compile=False)
input = getModelInput()
prediction = model(input, training=False).numpy()
return prediction
start_time = perf_counter()
with multiprocessing.Pool(4) as predictionPool:
results = predictionPool.map_async(poolPrediction, modelPaths).get()
end_time = perf_counter()
print(f'Prediction loop execution: {(end_time - start_time) * 1000:.2f} ms')
it takes even longer (~1.700 ms).
I'm sure I'm missing something huge and obvious here and therefore I'm grateful for the smallest hint or idea!