0

I have an ec2 container serving inference for ML models, which need to be cached in memory to avoid cold starts. We are using an LRU cache and selecting the model based on a query parameter.

As the container scales up, our naive ELB is doing RR to forward traffic, so the same models end up loaded in each server's cache. Ideally, we'd like to use the path to always forward requests for the same model to the same server.

I see how to do this manually, but as the target group scales rules would need to be adjusted. Is there a way to provide some kind of hashing function to the path based route?

1 Answers1

0

You can have separate Target Groups for different models. Ie one model per instance / container and different URL paths to use them. Or am I missing something?

MLu
  • 24,849
  • 5
  • 59
  • 86
  • we're exposing inference for hundreds of models that are loaded dynamically from S3, so a given container serves several of them. Fetching and loading a model takes a couple of seconds, so we're caching them. We thought we could use ELB rules to keep the cache healthy as the containers scale up. I guess programmatically editing the target groups could be an option? I was hoping to avoid having to provide custom logic for this. – jminuscula Mar 08 '21 at 08:17