Azure Machine Learning Request Response latency

Question

I have made an Azure Machine Learning Experiment which takes a small dataset (12x3 array) and some parameters and does some calculations using a few Python modules (a linear regression calculation and some more). This all works fine.

I have deployed the experiment and now want to throw data at it from the front-end of my application. The API-call goes in and comes back with correct results, but it takes up to 30 seconds to calculate a simple linear regression. Sometimes it is 20 seconds, sometimes only 1 second. I even got it down to 100 ms one time (which is what I'd like), but 90% of the time the request takes more than 20 seconds to complete, which is unacceptable.

I guess it has something to do with it still being an experiment, or it is still in a development slot, but I can't find the settings to get it to run on a faster machine.

Is there a way to speed up my execution?

Edit: To clarify: The varying timings are obtained with the same test data, simply by sending the same request multiple times. This made me conclude it must have something to do with my request being put in a queue, there is some start-up latency or I'm throttled in some other way.

Dan Ciborowski - MSFT · Accepted Answer · 2016-01-27T16:10:48.927

8

First, I am assuming you are doing your timing test on the published AML endpoint.

When a call is made to the AML the first call must warm up the container. By default a web service has 20 containers. Each container is cold, and a cold container can cause a large(30 sec) delay. In the string returned by the AML endpoint, only count requests that have the isWarm flag set to true. By smashing the service with MANY requests(relative to how many containers you have running) can get all your containers warmed.

If you are sending out dozens of requests a instance, the endpoint might be getting throttled. You can adjust the number of calls your endpoint can accept by going to manage.windowsazure.com/

manage.windowsazure.com/
Azure ML Section from left bar
select your workspace
go to web services tab
Select your web service from list
adjust the number of calls with slider

By enabling debugging onto your endpoint you can get logs about the execution time for each of your modules to complete. You can use this to determine if a module is not running as you intended which may add to the time.

Overall, there is an overhead when using the Execute python module, but I'd expect this request to complete in under 3 secs.

edited Jan 27 '16 at 16:10

answered Jan 26 '16 at 18:20

Dan Ciborowski - MSFT

6,807
10
53
88

2

Your answer together with [this article](http://jiffie.blogspot.be/2015/10/trying-to-figure-out-azure-ml.html) made me understand the problem: by default, there are 20 containers assigned which are all cold to start with. I wasn't doing concurrent calls, and updating quite a lot, so more often that not I was calling one of those cold containers, which took about 30 seconds. Only after hitting all containers once you are guaranteed to get a fast response. For testing, I downgraded to 4 containers, which means I begin with 4 slow calls after which all containers are warm. – JrtPec Jan 27 '16 at 14:27
Thanks, I'll review the article and add anything I missed to the answer. – Dan Ciborowski - MSFT Jan 27 '16 at 16:07
These "cool" containers are an inconvenience though. Suppose I update the web service, the first 20 requests the clients make will be very slow (depending on the number of concurrent calls in the settings, it will be even worse). Is there a way to tell all containers to "warm up" after deployment? – JrtPec Jan 27 '16 at 20:12
1

The way that endpoints are managed overall is being reconsidered. Look for improvements this year that should simplify this experience. But, for now, if you can not afford the time delay of a cold container, check this out, https://github.com/paveldournov/RRSWarmer – Dan Ciborowski - MSFT Jan 28 '16 at 12:26
@DanCiborowski-MSFT Does this still apply with the new portal for web services? I still encounter slowness < 15s before a response is sent back. – 123 456 789 0 Oct 14 '16 at 07:07
@lll The way additional containers are warmed, and added to the endpoint has changed since originally writing this... I believe that the isWarm flag should still be there. Are you seeing it come back as false on your slow calls? I am a litte out of date about the current deployment, but will try and update my answer with current information – Dan Ciborowski - MSFT Oct 17 '16 at 13:42
@DanCiborowski-MSFT 4 years later, the problem still seems to persist. In addition to that, the directions you gave regarding how to change the number of concurrent calls seems to no longer apply. Microsoft changed everything. Unprecedented. Yaaaay! – Zizzipupp Jan 07 '20 at 13:57
@Zizzipupp are you still using Azure ML Studio? We have shifted to primarily using our new product, "Azure Machine Learning Services". I find that the new product is better at dealing with this type of problem, and would recommend you check that out instead, if you are looking for real production level scalability in a service. – Dan Ciborowski - MSFT Jan 07 '20 at 18:45
@DanCiborowski-MSFT thanks for suggesting that, it might be our next step. But does the new product provide control over the number of concurrent calls? The latency time of my solution is scaring my users away. – Zizzipupp Jan 08 '20 at 14:41
@Zizzipupp web services are deployed to Kubernetes where you control the size of the cluster, so yes. AML Studio took a bit of advanced knowledge to get the calls at a okay latency, and even then we weren't quite happy with the performance. Using Kubernetes should give you much finer control. Now, if you can precalculate your predictions and serve them in real time, I have a design published for using CosmosDB for low latency serving. https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/ai/real-time-recommendation – Dan Ciborowski - MSFT Jan 09 '20 at 21:09
Here is the generic design for a simple Real-time Scoring with Python using AML Services. https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/ai/realtime-scoring-python – Dan Ciborowski - MSFT Jan 09 '20 at 21:10

Azure Machine Learning Request Response latency

1 Answers1

Linked