According to Wikipedia, there is no standard definition of percentile; however, they give a few possible definitions. The code you've posted appears to be closest to the Nearest Rank Method, but it's not quite the same.
The formula they give is
n = ceiling((P / 100) x N)
where N
is the length of the list, P
is the percentile, and n
will be the ordinal rank. You've already done the division by 100. Looking at the examples they give, it's clear that the "ordinal rank" is the index in the list, but it's 1-relative. Thus, to get an index into a Java array, you'd have to subtract 1. Therefore, the correct formula should be
n = ceiling(percentile * N) - 1
Using the variables in your code, the Java equivalent would be
(int) Math.ceil(percentiles[i] * latencies.length) - 1
This is not quite the code you've written. When you cast a double
to an int
, the result is rounded toward 0, i.e. it's the equivalent of the "floor" function. So your code computes
floor(percentiles[i] * latencies.length)
If percentiles[i] * latencies.length
is not an integer, the result is the same either way. However, if it is an integer, so that "floor" and "ceiling" are the same value, then the result will be different.
An example from Wikipedia is to compute the 40th percentile when the list is {15, 20, 35, 40, 50}. Their answer is to find the second item in the list, i.e. 20, because 0.40 * 5 = 2.0, and ceiling(2.0) = 2.0.
However, your code:
int index = (int) (percentiles[i] * latencies.length);
will result in index
being 2, which isn't what you want, because that will give you the third item in the list, instead of the second.
So in order to match the Wikipedia definition, your computation of the index will need to be modified a little. (On the other hand, I wouldn't be surprised if someone comes along and says your computation is correct and Wikipedia is wrong. We'll see...)