Monitoring Quality of Serverice of all API response time, Which one is better approach median, span 5-95 or upper95?

Question

I want to monitor response time of an API. I can methods like average, median and other for monitoring. But I am facing following problems with those methods:

Problem with average

if one of the request taken very high time. For example in given set average will become high due to value 1000.
S1= [ 1 , 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1000]

Problem with Median

It will be correct value only upto 50%. For example in given set S2=[2,2,2,2,2,50,50,50,50]. median gives as value 2 but most of the user are facing slow response.

Problem with 5-95 span (http://steveakers.com/2013/08/01/span-vs-median-for-response-time-monitors/)

In above article author suggested using value uppser95-uppser5. But that will not generate alert if response time is like: s3=[50,50,50,50,50] . In this case all API are response are slow. But span 5-95 is zero.

I am thinking of using either of these two values. upper95 or (upper95+upper5)/2.

Which one will be better and why ? Is there any better method to calculate QOS ?

score 1 · Answer 1 · answered Oct 14 '14 at 22:18

You listed three measurements:

Average (mean) response
Median response
5-95 span response

Notice that #3 is not measuring the same thing as #1 and #2!

Mean and median give you a measure of the actual response time. This will pick up a certain class of problem.
5-95 span tells you to what extent your response time varies. i.e. Is your response time consistent or not. This will pick up another class of problem.

You probably need to track both: the absolute response time, as well as the variance. The best approach for the former (mean vs median, whether to clip outliers) probably depends on the results you get for your service.

Monitoring Quality of Serverice of all API response time, Which one is better approach median, span 5-95 or upper95?

1 Answers1