1

I was using the following code to boxplot the values with 5th and 95th percentiles as upper and lower bounds. Surprisingly, I have got two different plots by using matplotlib 1.4.0 in python 2.7.3 and matplotlib 2.2.0 in python 3.6.5. The version 1.4.0 seems to show the maximum value (49.33) and version 2.2.0 to show a value around 25 as 95th percentile, while the actual 95th percentile is 36.13. What could be a possible reason of these differences? And, which one should be considered as correct?

import numpy as np
import matplotlib.pyplot as plt

values = np.array([0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,49.33,0.00,0.00,25.33])

f, (ax1) = plt.subplots()
ax1.boxplot(values, whis=[5.0,95.0], showfliers=False)
plt.show()

Boxplot using matplotlib 1.4.0 in python 2.7.3

Boxplot using matplotlib 2.2.0 in python 3.6.5

PyLabour
  • 245
  • 3
  • 5
  • 15
  • Can you try passing `whis` as floats instead of integers, e.g. `[5.0, 95.0]`? – hilberts_drinking_problem May 19 '18 at 16:38
  • Just have checked by using floats in whis, but no change. – PyLabour May 19 '18 at 16:42
  • What numpy versions do you have? Do they return the same value for `np.percentile(values, 95)` – Stop harming Monica May 19 '18 at 17:07
  • yes, both return 36.13 for np.percentile(values, 95) – PyLabour May 19 '18 at 17:10
  • I do not see any bugs, just different ways to choose the 95th percentile. What makes you think it is a bug? "cannot incorporate the 5th and 95th percentiles" they are both there. – Stop harming Monica May 19 '18 at 23:28
  • Could you please explain how "they are both there"? The plot with 1.4.0 version is actually showing the maximum value (49.33) as 95th percentile. However, the other plot is also not showing the theoretical 95th percentile of 36.13 instead a value around 25. Could please help me to understand which one is correct and why? – PyLabour May 20 '18 at 01:21
  • I have edited my question to reflect my intention to know that which one of the above to plot is correct. – PyLabour May 20 '18 at 05:07

1 Answers1

1

I think it's hard to say which one is correct, since the whisker positions simply depend on a definition.

In the current matplotlib version the definition of the whisker position is that it shows at the highest datum within the range determined by the whis parameter.
Here you use whis=[5.0,95.0] and the 95-percentile would be ~36. The highest datum below or equal 36 is 25.33; hence the whisker is shown at that value.

I do not know the definition used in the initial version of the boxplot in matplotlib 1.4, but I could imagine it to be the lowest value outside the percentile range given to whis; hence it would be shown at 49.33.

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712