8

I have a large array of doubles and I need to calculate the 75th and 90th percentile values for the array. What's the most efficient way to do this via a function?

Garry Pettet
  • 8,096
  • 22
  • 65
  • 103

3 Answers3

23

It's been awhile since statistics, so I could be off here - but here's a crack at it.

function get_percentile($percentile, $array) {
    sort($array);
    $index = ($percentile/100) * count($array);
    if (floor($index) == $index) {
         $result = ($array[$index-1] + $array[$index])/2;
    }
    else {
        $result = $array[floor($index)];
    }
    return $result;
}

$scores = array(22.3, 32.4, 12.1, 54.6, 76.8, 87.3, 54.6, 45.5, 87.9);

echo get_percentile(75, $scores);
echo get_percentile(90, $scores);
Mark Miller
  • 7,442
  • 2
  • 16
  • 22
19

The answer above could throw an undefined index notice if you use the higher percent value (100) and does not return correct values according to Excel PERCENTILE function. You can see here an example of how it fails.

I've written a function in PHP according the Wikipedia Second varitant, which is the one used in Excel. This function is also protected from a non percentual value (out of range).

function getPercentile($array, $percentile)
{
    $percentile = min(100, max(0, $percentile));
    $array = array_values($array);
    sort($array);
    $index = ($percentile / 100) * (count($array) - 1);
    $fractionPart = $index - floor($index);
    $intPart = floor($index);

    $percentile = $array[$intPart];
    $percentile += ($fractionPart > 0) ? $fractionPart * ($array[$intPart + 1] - $array[$intPart]) : 0;

    return $percentile;
}
Leon Husmann
  • 664
  • 1
  • 6
  • 25
Roger Codina
  • 191
  • 1
  • 4
-1

Working off of Mark's function above, I believe the function should actually be:

function get_percentile($percentile, $array) {
    sort($array);
    $index = (($percentile/100) * (count($array))-1;
    if (floor($index) == $index) {
         return $array[$index];
    }
    else {
        return ($array[floor($index)] + $array[ceiling($index)])/2;
    }
}

I think there are three things that needed to be corrected:

  1. Needed to reduce count by one in order to avoid an out-of-range index (mentioned above)
  2. If the calculated index is an integer, then you should be able to just return the index. You only need to average values when the index is not an integer.
  3. For the average, instead of arbitrarily subtracting one from index, it's better to use floor and ceiling to get the indices to average