16

Given a long array of latencies which are in milliseconds, I want to calculate percentile from them. I got below method which does the work but I am not sure how I can verify whether this gives me accurate result?

  public static long[] percentiles(long[] latencies, double... percentiles) {
    Arrays.sort(latencies, 0, latencies.length);
    long[] values = new long[percentiles.length];
    for (int i = 0; i < percentiles.length; i++) {
      int index = (int) (percentiles[i] * latencies.length);
      values[i] = latencies[index];
    }
    return values;
  }

I would like to get 50th, 95th, 99th and 99.9th percentile from latencies array.

long[] percs = percentiles(latencies, 0.5, 0.95, 0.99, 0.999);

Is this the right way to get percentile given a long array of latencies? I am working with Java 7.

  • 1
    Note that not only does your `percentiles` method compute the percentile values (not always correctly--see my answer) and return the values, it also leaves the `latencies` array sorted, which is a side effect that may not be desirable. This is probably harmless in the small program you're trying to write, but in general it's not good practice for a method to have a side effect that isn't the purpose of the method. – ajb Jan 01 '17 at 07:06

4 Answers4

24

This is what you are looking for:

public static void main(String[] args) {
    List<Long> latencies = new List<Long>() { 3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20 };
    Collections.sort(latencies);

    System.out.println(percentile(latencies, 25));
    System.out.println(percentile(latencies, 50));
    System.out.println(percentile(latencies, 75));
    System.out.println(percentile(latencies, 100));
}

public static long percentile(List<Long> latencies, double percentile) {
    int index = (int) Math.ceil(percentile / 100.0 * latencies.size());
    return latencies.get(index-1);
}

enter image description here

peedee
  • 3,257
  • 3
  • 24
  • 42
user7358693
  • 503
  • 2
  • 3
  • 7
    Ummm, did you notice the Java tag on the question? – ajb Jan 01 '17 at 07:01
  • 1
    Quick translation to java: `public static double percentile(List values, double percentile) { Collections.sort(values); int index = (int) Math.ceil((percentile / 100) * values.size()); return values.get(index - 1); }` – LD Robillard Nov 16 '18 at 17:16
  • 2
    Its gonna crash when percentile is 0, but that is edge case I guess. – Jan Moravec Apr 13 '19 at 20:40
4
public static double percentile(double percentile, List<Double> items) {
    Preconditions.checkArgument(percentile >= 0);
    Preconditions.checkArgument(percentile <= 100);
    Preconditions.checkArgument(!items.isEmpty());

    Collections.sort(items);
    return items.get((int) Math.round(percentile / 100.0 * (items.size() - 1)));
}


@Test
public void test1() {
    List<Double> list = Arrays.asList(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0);
    assertThat(percentile(0, list)).isEqualTo(0.0);
    assertThat(percentile(20, list)).isEqualTo(2.0);
    assertThat(percentile(80, list)).isEqualTo(8.0);
    assertThat(percentile(100, list)).isEqualTo(10.0);
}

@Test
public void test2() {
    List<Double> list = Arrays.asList(0.0, 1.0, 2.0, 3.0);
    assertThat(percentile(51, list)).isEqualTo(2.0);
    assertThat(percentile(49, list)).isEqualTo(1.0);
}

@Test
public void test3() {
    List<Double> list = Arrays.asList(42.0);     
    assertThat(percentile(0, list)).isEqualTo(42.0);
    assertThat(percentile(100, list)).isEqualTo(42.0);
}
Zsolt Safrany
  • 13,290
  • 6
  • 50
  • 62
2

According to Wikipedia, there is no standard definition of percentile; however, they give a few possible definitions. The code you've posted appears to be closest to the Nearest Rank Method, but it's not quite the same.

The formula they give is

n = ceiling((P / 100) x N)

where N is the length of the list, P is the percentile, and n will be the ordinal rank. You've already done the division by 100. Looking at the examples they give, it's clear that the "ordinal rank" is the index in the list, but it's 1-relative. Thus, to get an index into a Java array, you'd have to subtract 1. Therefore, the correct formula should be

n = ceiling(percentile * N) - 1

Using the variables in your code, the Java equivalent would be

(int) Math.ceil(percentiles[i] * latencies.length) - 1

This is not quite the code you've written. When you cast a double to an int, the result is rounded toward 0, i.e. it's the equivalent of the "floor" function. So your code computes

floor(percentiles[i] * latencies.length)

If percentiles[i] * latencies.length is not an integer, the result is the same either way. However, if it is an integer, so that "floor" and "ceiling" are the same value, then the result will be different.

An example from Wikipedia is to compute the 40th percentile when the list is {15, 20, 35, 40, 50}. Their answer is to find the second item in the list, i.e. 20, because 0.40 * 5 = 2.0, and ceiling(2.0) = 2.0.

However, your code:

int index = (int) (percentiles[i] * latencies.length);

will result in index being 2, which isn't what you want, because that will give you the third item in the list, instead of the second.

So in order to match the Wikipedia definition, your computation of the index will need to be modified a little. (On the other hand, I wouldn't be surprised if someone comes along and says your computation is correct and Wikipedia is wrong. We'll see...)

ajb
  • 31,309
  • 3
  • 58
  • 84
0

If the array is sorted, you should just return the relative element in your array (e.g., p99 in a 1000-elements array is the 990th element).

If the array is unsorted, and in order to calculate percentiles more efficiently, you should probably use something like Quickselect.

Moshe Bixenshpaner
  • 1,840
  • 1
  • 17
  • 23