0

Recently i have encountered the fact that org.apache.commons.math3.stat lib's DescriptiveStatistics::getPercentile method uses different approach to calculate the percentile of a given number set than the regular method. This SO answer explains the difference. here

So it seems this is not a bug but some decision they took intentionally. What is the reason behind using some different method to calculate percentile in apache.commons's lib without using standard method?.

Or is there any assumption behind this method (like assuming data set would be large) or some practical issue(like performance over accuracy)?.

Can someone explain the reason behind this algorithmic decision.

Community
  • 1
  • 1
HarshaXsoad
  • 776
  • 9
  • 30

1 Answers1

0

As you can see in the answer of the referenced question, there are different definitions for the percentile calculation.

The Percentile class allows you to select the definition as needed (example below will select the definition as used by Excel) like that (or by using the specialized constructor):

new Percentile(quantile).withEstimationType(EstimationType.R_7);

For DescriptiveStatistics, you can set the Percentile implementation that you prefer/need:

DescriptiveStatistics stats = new DescriptiveStatistics();
stats.setPercentileImpl(percentile);
T. Neidhart
  • 6,060
  • 2
  • 15
  • 38
  • But my question is why different methods. Does it depend on distribution or the size of the data set or just for computational efficiency. You know those two methods will return different results for the same data set when i ask for percentile. So which one should i choose(I guess it depends on the context and that's why i need to know which to use when) – HarshaXsoad Jun 17 '16 at 19:05