I'm using container image with 5x170Mb AI models. When I invoke function the first time all those models load into memory for further inference.
Problem: more often it takes about 10-25 sec per file to load. (So cold start takes about 2 minutes) But sometimes it loads as expected about 1-2 sec a model and cold start takes only 10 secs.
After little investigation I've found that it's all about reading/opening file from disk into memory. So simple "read byte-file from disk to variable" takes 10-20 seconds. Insane.
P.S. I'm using 10240Mb RAM functions and should have the most processing power.
Is there any way I can avoid so long loading? Why does it happens?
UPDATE:
- I'm using onnxruntime and Python to load the model
- All code and models stored in container and opened/loaded from there
- From experiment: if I open any model as
with open("model.onnx","rb") as f: cont = f.read()
it takes 20 secs to open the file. But then when I open the same file withmodel = onnxruntime.InferenceSession("model.onnx")
it loads instantly. So I've made a conclusion that problem with opening/reading file, not with onnx. - This also happens with reading big files in "ZIP" type function. It looks like it's not container problem.
TO REPRODUCE:
If you want to see how it works on your side.
- Create lambda function
- Configure it to 10240 mb ram and 30 sec timeout
- Upload ZIP from my S3: https://alxbtest.s3.amazonaws.com/file-open-test.zip
- Run/test event. It took me 16 seconds to open the file.
Zip contains "model.onnx" (168Mb) and "lambda_fuction.py" with code:
import json,time
def lambda_handler(event, context):
# TODO implement
tt = time.time()
with open("model.onnx","rb") as f:
cont = f.read()
tt = time.time()-tt
print(f"Open time: {tt:0.4f} s")
return {
'statusCode': 200,
'body': json.dumps(f'Open time: {tt:0.4f} s')
}