1

I am not sure if this is a bug or if I am simply misinterpreting the output of matplotlib's cumulative histogram. E.g., what I expect is "at a certain x value, the corresponding y-value tells me how many samples are <= x."

import matplotlib.pyplot as plt

X = [1.1, 3.1, 2.1, 3.9]
n, bins, patches = plt.hist(X, normed=False, histtype='step', cumulative=True)
plt.ylim([0, 5])
plt.grid()
plt.show()

enter image description here

See the 2nd vertical line at x=1.9? Shouldn't it be at 2.1 given the data in X? E.g., at x=3 I would read "3 samples have a value x <= 3.1" ...

So, basically what I would expect is something similar to this step plot.

plt.step(sorted(X), range(1, len(X)+1), where='post')
plt.ylim([0, 5])
plt.grid()

enter image description here

Edit:

I am using python 3.4.3 & matplotlib 1.4.3

Community
  • 1
  • 1

1 Answers1

4

If you do not set the bins parameter yourself, plt.hist will choose (by default, 10) bins for you:

In [58]: n, bins, patches = plt.hist(X, normed=False, histtype='step', cumulative=True)

In [59]: bins
Out[59]: 
array([ 1.1 ,  1.38,  1.66,  1.94,  2.22,  2.5 ,  2.78,  3.06,  3.34,
        3.62,  3.9 ])

The return value bins shows the edges of the bins that matplotlib chose.

It sounds like you want the values in X to serve as bin edges. Using bins=sorted(X)+[np.inf]:

import numpy as np
import matplotlib.pyplot as plt

X = [1.1, 3.1, 2.1, 3.9]
bins = sorted(X) + [np.inf]
n, bins, patches = plt.hist(X, normed=False, histtype='step', cumulative=True, 
                            bins=bins)
plt.ylim([0, 5])
plt.grid()
plt.show()

yields

The [np.inf] makes the right edge of the final bin extend to infinity. Matplotlib is smart enough to not try to draw non-finite values, so all you see is the left-edge of the last bin.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Oh I see, that makes sense -- I somehow wrongly assumed that this would/should be the default behavior. Thanks! –  Apr 23 '15 at 16:32