I have an ec2 container serving inference for ML models, which need to be cached in memory to avoid cold starts. We are using an LRU cache and selecting the model based on a query parameter.
As the container scales up, our naive ELB is doing RR to forward traffic, so the same models end up loaded in each server's cache. Ideally, we'd like to use the path to always forward requests for the same model to the same server.
I see how to do this manually, but as the target group scales rules would need to be adjusted. Is there a way to provide some kind of hashing function to the path based route?