0

I'm using container image with 5x170Mb AI models. When I invoke function the first time all those models load into memory for further inference.

Problem: more often it takes about 10-25 sec per file to load. (So cold start takes about 2 minutes) But sometimes it loads as expected about 1-2 sec a model and cold start takes only 10 secs.

After little investigation I've found that it's all about reading/opening file from disk into memory. So simple "read byte-file from disk to variable" takes 10-20 seconds. Insane.

P.S. I'm using 10240Mb RAM functions and should have the most processing power.

Is there any way I can avoid so long loading? Why does it happens?

UPDATE:

  • I'm using onnxruntime and Python to load the model
  • All code and models stored in container and opened/loaded from there
  • From experiment: if I open any model as with open("model.onnx","rb") as f: cont = f.read() it takes 20 secs to open the file. But then when I open the same file with model = onnxruntime.InferenceSession("model.onnx") it loads instantly. So I've made a conclusion that problem with opening/reading file, not with onnx.
  • This also happens with reading big files in "ZIP" type function. It looks like it's not container problem.

TO REPRODUCE:

If you want to see how it works on your side.

  1. Create lambda function
  2. Configure it to 10240 mb ram and 30 sec timeout
  3. Upload ZIP from my S3: https://alxbtest.s3.amazonaws.com/file-open-test.zip
  4. Run/test event. It took me 16 seconds to open the file.

Zip contains "model.onnx" (168Mb) and "lambda_fuction.py" with code:

import json,time

def lambda_handler(event, context):
    # TODO implement
    
    tt = time.time()
    with open("model.onnx","rb") as f:
        cont = f.read()
    tt =  time.time()-tt
    
    print(f"Open time: {tt:0.4f} s")
    
    return {
        'statusCode': 200,
        'body': json.dumps(f'Open time: {tt:0.4f} s')
    }
bezale
  • 11
  • 4
  • 1
    From _where_ is it loading the file? Within the context of an AWS Lambda function, what is the "disk" it is being loaded from? – John Rotenstein Oct 09 '21 at 21:13
  • 1
    Is the file constant? If so maybe you can bundle it with your function? – Marcin Oct 09 '21 at 22:41
  • Also, how are you reading the file using what language? It can help to see the code used to read the file. – Ermiya Eskandary Oct 10 '21 at 07:00
  • @JohnRotenstein I'm using [container image Link](https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/) which can be up to 10gb in size. All models stored and opened from that container. – bezale Oct 10 '21 at 09:28
  • @ErmiyaEskandary please see "update" in my question – bezale Oct 10 '21 at 09:40
  • @Marcin please see "update" in my question – bezale Oct 10 '21 at 09:40
  • Take a look at https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html especially for machine learning - have you tried using them? – Ermiya Eskandary Oct 10 '21 at 09:41
  • 1
    @ErmiyaEskandary Layers is not very good for my goals, as they can be not more then 50Mb and my model is 168. But I've tried to separate the file to 50mb chunks, load as layers and they have the same long reading time as well. No difference. – bezale Oct 10 '21 at 10:27
  • @AlexB. Ah chunking was my idea - let me reproduce – Ermiya Eskandary Oct 10 '21 at 10:28
  • Do they all run as expected after 1/2 requests? Is this just a cold start problem or can you consistently reproduce over multiple consecutive requests? – Ermiya Eskandary Oct 10 '21 at 10:33
  • 1
    @ErmiyaEskandary This is only cold start problem. The second invocation is fast. But 2 minutes cold start is no way option for me when sometimes it can load everything within 10 seconds (also cold start). – bezale Oct 10 '21 at 10:38
  • @AlexB. Read: https://stackoverflow.com/a/69512894/4800344 for the diagram - try setting provisioned concurrency to 100 as that should get rid of the cold start problem. Note you have to give it 1/2 minutes before it takes effect - your question is more of a cold start problem as opposed to the problem being Python or large files etc. – Ermiya Eskandary Oct 10 '21 at 10:40
  • @AlexB. Does that work? – Ermiya Eskandary Oct 10 '21 at 10:41
  • @AlexB. Also - are you reading the files at the same time or one by one in your actual code? – Ermiya Eskandary Oct 10 '21 at 10:45
  • @ErmiyaEskandary I'm reading them one by one. About provisioned concurrency: 1. I wanted to avoid it because of additional costs. 2. Cold start of my function is actually pretty fast (up to 1 sec) but bad things happens when I try to read big files *the first time*. So I've tried to separate model loading from the function initialization. Cold start took me 1 sec. Then I call a model loading method from already initialized function and it reads them for 20 seconds each. – bezale Oct 10 '21 at 10:52
  • Fair enough - yes the first time will take the longest while Lambda is trying to essentially "cache" and get the files - read all of the files at the same time especially considering you have tons of memory - does that reduce cold start down to 20 seconds? – Ermiya Eskandary Oct 10 '21 at 10:55
  • @ErmiyaEskandary Will try to do that when I find the right code for simultaneous file opening :) – bezale Oct 10 '21 at 11:06
  • @AlexB. I'll type something up, one sec – Ermiya Eskandary Oct 10 '21 at 11:07
  • @AlexB. Try - https://pastebin.com/E4HNCJ8c – Ermiya Eskandary Oct 10 '21 at 11:15
  • 1
    @ErmiyaEskandary Thanks! I'll get back with results when do all tests. – bezale Oct 10 '21 at 11:23
  • @ErmiyaEskandary Unfortunately it works as before. Summary time is still the same. As you said It really looks like it some kind of caching problem. But why does is sometimes loads fast.. This is what make me sleepless at night. I'm trying to solve this problem for almost two months.. – bezale Oct 10 '21 at 11:33
  • @AlexB. It loads fast as the data is then cached on the lambda runtime environment - can you please paste your updated code in the question? Also - is there the possibility of reading these files from S3? – Ermiya Eskandary Oct 10 '21 at 11:36
  • @AlexB. It may be much better to just download the file from S3 (Lambda has around 500MB/s download speed) as opposed to bundling? Try that – Ermiya Eskandary Oct 10 '21 at 11:37
  • @ErmiyaEskandary Maybe I wasn't clear enough about "sometimes fast loading". Here is my workflow. I build container (docker) image locally from my code and models. Then upload and deploy it with [SAM aws utility](https://aws.amazon.com/serverless/sam/). And sometimes all cold starts (and model loading) works fast right after uploading this new update. But with another such update (and it happens more often) everything works slow. I can redeploy the same image with no changes and it can be fast or slow randomly. – bezale Oct 10 '21 at 11:42
  • @ErmiyaEskandary Uploading from S3 is another thing I wanted to avoid because of almost 1gb models uploading each time can be very costly.. I'll try this option anyway. Maybe this this will solve some model loading issues. – bezale Oct 10 '21 at 11:46
  • Uploading each time? No, just upload once, download – Ermiya Eskandary Oct 10 '21 at 11:46
  • @ErmiyaEskandary Each time a new concurrent function will be invoked/cold started. With some decent traffic it can be pretty costly just from S3.. Anyway if it somehow solve the issue I'll definitely will look into this option. But It would be so great to understand why it sometimes works fast from beginning. – bezale Oct 10 '21 at 12:08
  • @AlexB. Sorry! It’s one for AWS support best of luck! – Ermiya Eskandary Oct 10 '21 at 12:15

4 Answers4

1

Lambda is not designed for big heavy lifting. Its design intent is small, quickly firing low scope functions. You have two options.

  1. Use an EC2 instance. This is more expensive, but it is a server and designed for this kind of thing

  2. Maybe try Elastic File System - this is another service that can tied to lambda which provides a 'cross invocation' File System that Lambda's can access almost as if it was internal, and exists outside of a single invocation of the lambda. This allows you to have large memory objects 'pre loaded' into the file system memory that the Lambda can access, manipulate, and do whatever with without loading it first into its internal memory.

I noticed you also said AI models. There are specific services for Machine Learning, such as Sage Maker you may take a look into.

lynkfox
  • 2,003
  • 1
  • 8
  • 16
  • 2
    Not sure why would lambda be limited to firing quick low scope functions? With 10gb ram, 15 timeouts and multiple CPUs it can perfectly handle heavy computations. – Marcin Oct 09 '21 at 22:36
  • It *can* but its design philosophy was for much smaller than that. Just because something can doesn't mean its really designed for it. – lynkfox Oct 09 '21 at 23:03
  • 1
    I agree with @Marcin. Why have a EC2 running when you can do the same with Lambda. 10-20 seconds pricing for lambda is still peanuts compared to EC2 instances with same specs. – Lucasz Oct 10 '21 at 01:42
  • 2
    From [AWS Blog - AWS Lambda Container Support](https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/): "...you can now package and deploy Lambda functions as container images of up to 10 GB in size. In this way, you can also easily build and deploy larger workloads that rely on sizable dependencies, such as machine learning or data intensive workloads." Saying that Lambda is not designed for big heavy lifting is kind of outdated at this point. – Ervin Szilagyi Oct 10 '21 at 06:02
  • 1
    @lynkfox Actually 10Gb ram lambda function is pretty ok for inference. I'm using onnxruntime (python) and inference took me about 1.2 sec which is absolutely fine for me and service I'm building. The problem as stated before in very long and strange cold start. Strange because it can load all 5 models into memory in 10 secs or less but more often it decides to load 20+ sec a model instead of 1.5 secs. I can't understand why IO operation takes so long (again not always!) – bezale Oct 10 '21 at 09:24
  • To the others; I still maintain in practice that Lambda is not good for these kinds of tasks - this comes from my experience over the last year - But what you need and are willing to accept on constrains is always diff from person to person @AlexB. This could simply be due to time of day and load on AWS servers. I do find that working late at night in general the services respond much faster versus the middle of the day. That is simply conjecture however, no proof to back it up – lynkfox Oct 10 '21 at 14:22
  • @lynkfox Got it. One of the main priority witn my current project is to make it as cheap as I can at the start. And after calculations I've found that aws lambda would work perfectly for my needs. And it has "built-in" methods to grow smoothly under high traffic spikes. EC2, SageMaker is much more expensive. Everything would be perfect if Lambda can load that models fast rather then 20+ sec. Furthermore I know it can do that. So before trying more expensive options I have hope to find a solution for that long file openings. Can you share your thoughts how it can be solved? – bezale Oct 10 '21 at 17:17
  • As i mentioned in my answer above, EFS may be the solution you are looking for. Also, you could try putting the Models in Layers? I'm not sure if that would work or not - I know you can import libraries from layers but not sure if you can load files - that may give you a bit more control over the load times as Layers are considered 'already loaded' with start ups. – lynkfox Oct 11 '21 at 04:44
0

SHORT ANSWER: you can't control read/load speed of AWS Lambda

First of all, this problem is about read/write speed of current Lambda instance. It looks like on first invocation AWS look for free instance it can place lambda function to and all those instances has different IO speed.

More often it's about 6-9Mb/sec for reading which insanely slow for opening and working with big files.

Sometimes you are lucky and got instance with 50-80Mb/sec read. But it's pretty rare. Don't count on it.

So, if you want faster speed you must pay more:

  • Use Elastic File System as mentioned @lynkfox
  • Use S3

BONUS:

If you don't wired with AWS I've found Google Cloud Run much more suitable for my needs.

  • It uses docker containers as AWS lambda, also billed per 100 ms, can scale automatically
  • Read speed pretty stable and about 75Mb/sec
  • You can select RAM and vCPU separately which can lower costs.
  • You can load several big files simultaneously with multi processing which makes cold start much faster (multi processing load time in Lambda was the summary of all loaded files. Doesn't work for me)
Ryan M
  • 18,333
  • 31
  • 67
  • 74
bezale
  • 11
  • 4
0

The Init phase ends when the runtime and all extensions signal that they are ready by sending a Next API request. The Init phase is limited to 10 seconds. If all three tasks do not complete within 10 seconds, Lambda retries the Init phase at the time of the first function invocation.

Refer: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html

Check what the model load time on any ec2 machine (or CPU-based localhost) is If it is close to 10 seconds, there is a high chance the model is loaded again. The next init generally happens quickly as lambda already has ready some of the content and loaded the state.

To make the read faster, others have suggested trying EFS. In addition, try EFS in Elastic mode.

anujs
  • 47
  • 9
0

a more expensive, but way faster, approach is to use ElastiCache, which is basically a key:value pairs held in RAM.

https://aws.amazon.com/elasticache/pricing/

rictuar
  • 74
  • 6