9

I am confused about the matplotlib hist function.

The documentation explains:

If a sequence of values, the values of the lower bound of the bins to be used.

But when I have two values in sequence i.e [0,1], I only get 1 bin. And when I have three like so:

plt.hist(votes, bins=[0,1,2], normed=True)

I only get two bins. My guess is that the last value is just an upper bound for the last bin.

Is there a way to have "the rest" of the values in the last bin, other than to but a very big value there? (or in other words, without making that bin much bigger than the others)

It seems like the last bin value is included in the last bin

votes = [0,0,1,2]
plt.hist(votes, bins=[0,1])

This gives me one bin of height 3. i.e. 0,0,1. While:

votes = [0,0,1,2]
plt.hist(votes, bins=[0,1,2])

Gives me two bins with two in each. I find this counter intuative, that adding a new bin changes the widthlimits of the others.

votes = [0,0,1]
plit.hist[votes, bins=2) 

yeilds two bins size 2 and 1. These seems to have been split on 0,5 since the x-axis goes from 0 to 1.

How should the bins array be interpreted? How is the data split?

  • 1
    What version of `mpl` are you using? There was a change in `numpy`'s hist function a while ago that changed the meaning of the `bins` a bit, it is important to make sure you are looking at documentation that matches the versions you are using. – tacaswell Mar 02 '13 at 18:04
  • I am using version 1.6.1. Thank you for the note. – Christopher Käck Mar 02 '13 at 18:39

1 Answers1

15
votes = [0, 0, 1, 2]
plt.hist(votes, bins=[0,1])

this gives you one bin of height 3, because it splits the data into one single bin with the interval: [0, 1]. It puts into that bin the values: 0, 0, and 1.

votes = [0, 0, 1, 2]
plt.hist(votes, bins=[0, 1, 2])

this gives you an histogram with bins with intervals: [0, 1[ and [1, 2]; so you have 2 items in the 1st bin (the 0 and 0), and 2 items in the 2nd bin (the 1 and 2).

If you try to plot:

plt.hist(votes, bins=[0, 1, 2, 3])

the idea behind the data splitting into bins is the same: you will get three intervals: [0, 1[; [1, 2[; [2, 3], and you will notice that the value 2 changes its bin, going to the bin with interval [2, 3] (instead of staying in the bin [1, 2] as in the previous example).

In conclusion, if you have an ordered array in the bins argument like: [i_0, i_1, i_2, i_3, i_4, ..., i_n] that will create the bins:
[i_0, i_1[
[i_1, i_2[
[i_2, i_3[
[i_3, i_4[
...
[i_(n-1), i_n]

with the boundaries of each open or closed according to the brackets.

sissi_luaty
  • 2,839
  • 21
  • 28
  • 6
    Another way to make this clear/behave better, is to subtract `0.5` from your bin edges (if you expect the values in `votes` to be integers) so you can side step these details about open/closed sets. – tacaswell Mar 02 '13 at 18:10
  • also +1 for super clear explanation of the details of the open/closed set issue. – tacaswell Mar 02 '13 at 18:13