Best practice to deploy multi models that will run concurrently at scale (something like map reduce)

Question

I have a model that consists 150 models (runs in for loop). In order to be performance oriented, I would like to split it into 150 models, that for every request my server gets it will send 150 api requests to every different model and then combine the result (so that the invocations will run parallely). So called map reduce

I thought about AWS SageMaker multi model but it says that the use case is better for serial running more than parallel or concurrent run.

In addition, I thought about maybe creating lambda function that will read the model and scale accordingly (serverless), but it sounds very odd to me and that I miss SageMaker's usecases.

Thanks!

score 0 · Answer 1 · answered Oct 03 '22 at 20:58

are your models similarly sized? This should not be an issue for the concurrent requests as long as you choose an instance type to back the endpoint that has an appropriate amount of workers to be able to handle these requests. Check out the Real-Time Inference SageMaker Pricing page to see the different instance types you can use, I would suggest tuning this instance type along with count to be able to handle your requests.

Best practice to deploy multi models that will run concurrently at scale (something like map reduce)

1 Answers1