35

I have a numpy array results that looks like

[ 0.  2.  0.  0.  0.  0.  3.  0.  0.  0.  0.  0.  0.  0.  0.  2.  0.  0.
  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.
  0.  1.  1.  0.  0.  0.  0.  2.  0.  3.  1.  0.  0.  2.  2.  0.  0.  0.
  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  2.  0.  0.  0.  0.
  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  3.  1.  0.  0.  0.  0.  0.
  0.  0.  0.  1.  0.  0.  0.  1.  2.  2.]

I would like to plot a histogram of it. I have tried

import matplotlib.pyplot as plt
plt.hist(results, bins=range(5))
plt.show()

This gives me a histogram with the x-axis labelled 0.0 0.5 1.0 1.5 2.0 2.5 3.0. 3.5 4.0.

I would like the x-axis to be labelled 0 1 2 3 instead with the labels in the center of each bar. How can you do that?

Simd
  • 19,447
  • 42
  • 136
  • 271
  • Im not sure what you want. Do you want the bins centered around 1,2,3 (so around the integer instead of the 1.5, 2.5 values). Or do you want to label the bars with text or something? Because if I execute your command, my output is `(array([ 4., 5., 1., 2.]), array([0, 1, 2, 3, 4])` (with different input values). So I have got different bins, or do I miss something? – Mathias711 Apr 23 '14 at 13:58
  • @Mathias711 The first bar is the number of 0s in `results`, the second the numbers of 1s (there are eleven of them), the third the number of 2s (there are eight of them) and the last one is the number of 3s (there are three of them). I would like the number `0` as a label under the middle of the first bar, the number `1` as a label under the middle of the second and so on. Is that clearer? – Simd Apr 23 '14 at 14:01
  • So there are no problems with the binning, you just want to add labels to the bins? – Mathias711 Apr 23 '14 at 14:02
  • @Mathias711 Yes I want to get rid of the default labels and add the ones I described. – Simd Apr 23 '14 at 14:03

7 Answers7

60

The other answers just don't do it for me. The benefit of using plt.bar over plt.hist is that bar can use align='center':

import numpy as np
import matplotlib.pyplot as plt

arr = np.array([ 0.,  2.,  0.,  0.,  0.,  0.,  3.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,
        0.,  0.,  0.,  0.,  2.,  0.,  3.,  1.,  0.,  0.,  2.,  2.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  3.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  1.,  0.,  0.,  0.,  1.,  2.,  2.])

labels, counts = np.unique(arr, return_counts=True)
plt.bar(labels, counts, align='center')
plt.gca().set_xticks(labels)
plt.show()

centering labels in a histogram

Jarad
  • 17,409
  • 19
  • 95
  • 154
23

The following alternative solution is compatible with plt.hist() (and this has the advantage for instance that you can call it after a pandas.DataFrame.hist().

import numpy as np

def bins_labels(bins, **kwargs):
    bin_w = (max(bins) - min(bins)) / (len(bins) - 1)
    plt.xticks(np.arange(min(bins)+bin_w/2, max(bins), bin_w), bins, **kwargs)
    plt.xlim(bins[0], bins[-1])

(The last line is not strictly requested by the OP but it makes the output nicer)

This can be used as in:

import matplotlib.pyplot as plt
bins = range(5)
plt.hist(results, bins=bins)
bins_labels(bins, fontsize=20)
plt.show()

Result: success!

Pietro Battiston
  • 7,930
  • 3
  • 42
  • 45
  • 1
    May you explain why you only show the bins 0,1,2,3 but use `range(5)` ? – Wikunia Aug 01 '17 at 16:11
  • @Wikunia : sure. Bin 0 covers from 0 to 1 in the plot, bin 1 covers from 1 to 2... and so on until bin 3, which covers from 3 to 4 in the plot. So the bins (left and right) _borders_ must be the sequence [0, 1, 2, 3, 4]... which is precisely ``range(5)``. Strange, I know, but the only alternative I see (centering bin i going from i-1/2 to i+1/2) would be more complicated. – Pietro Battiston Aug 02 '17 at 23:07
  • This answer is efficient in a more general case, if bins are redefined, e.g. as `bins = np.arange(2, 7, .5)` – Dalker Nov 23 '17 at 10:12
18

Here is a solution that only uses plt.hist(). Let's break this down in two parts:

  1. Have the x-axis to be labelled 0 1 2 3.
  2. Have the labels in the center of each bar.

To have the x-axis labelled 0 1 2 3 without .5 values, you can use the function plt.xticks() and provide as argument the values that you want on the x axis. In your case, since you want 0 1 2 3, you can call plt.xticks(range(4)).

To have the labels in the center of each bar, you can pass the argument align='left' to the plt.hist() function. Below is your code, minimally modified to do that.

import matplotlib.pyplot as plt

results = [0,  2,  0,  0,  0,  0,  3,  0,  0,  0,  0,  0,  0,  0,  0,  2,  0,  0,
           0,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,
           0,  1,  1,  0,  0,  0,  0,  2,  0,  3,  1,  0,  0,  2,  2,  0,  0,  0,
           0,  0,  0,  0,  0,  1,  1,  0,  0,  0,  0,  0,  0,  2,  0,  0,  0,  0,
           0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  3,  1,  0,  0,  0,  0,  0,
           0,  0,  0,  1,  0,  0,  0,  1,  2,  2]

plt.hist(results, bins=range(5), align='left')
plt.xticks(range(4))
plt.show()

enter image description here

ricpacca
  • 800
  • 6
  • 9
  • Variant of the currently selected answer, no addition. – mins Dec 18 '20 at 11:57
  • 1
    @mins How so? This uses `plt.hist` instead of `plt.bar` for plotting a histogram, which seems to be the correct thing to do. – rvf Aug 09 '22 at 08:37
  • @rvf. My comment in 2020 wasn't referring to `plt.bar` in [Jarad's answer](https://stackoverflow.com/a/44463130/774575). Jarad's answer has been selected only recently (cf. "*This should be the answer*" posted in 2021) – mins Aug 09 '22 at 11:34
11

you can build a bar plot out of a np.histogram.

Consider this

his = np.histogram(a,bins=range(5))
fig, ax = plt.subplots()
offset = .4
plt.bar(his[1][1:],his[0])
ax.set_xticks(his[1][1:] + offset)
ax.set_xticklabels( ('1', '2', '3', '4') )

enter image description here

EDIT: in order to get the bars touching one another, one has to play with the width parameter.

 fig, ax = plt.subplots()
 offset = .5
 plt.bar(his[1][1:],his[0],width=1)
 ax.set_xticks(his[1][1:] + offset)
 ax.set_xticklabels( ('1', '2', '3', '4') )

enter image description here

Acorbe
  • 8,367
  • 5
  • 37
  • 66
0

Like Jarad pointed out in his answer, barplot is a neat way to do it. Here's a short way of plotting barplot using pandas.

import pandas as pd
import matplotlib.pyplot as plt

arr = [ 0.,  2.,  0.,  0.,  0.,  0.,  3.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,
        0.,  0.,  0.,  0.,  2.,  0.,  3.,  1.,  0.,  0.,  2.,  2.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  3.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  1.,  0.,  0.,  0.,  1.,  2.,  2.]

col = 'name'
pd.DataFrame({col : arr}).groupby(col).size().plot.bar()
plt.show()
  • "*barplot is a neat way to do it*". In some cases only. How would you do for [these cases](https://stackoverflow.com/q/41252078/774575)? – mins Dec 18 '20 at 12:00
  • Plot on the right could be created as Ted shows in his [answer](https://stackoverflow.com/a/41252153/10874583). To get plot on the left, size() should be used instead of sum(), i.e. `df.groupby(df.sold // 10 * 10).size().plot.bar()`. But i guess it's worth comparing results with other approaches. – Igor Kołakowski Dec 21 '20 at 12:39
  • 1
    I meant the difficulty in the linked case was about the use of `hist` with `weight` option. Cannot be replaced easily by `barplot` as it hasn't an equivalent possibility. – mins Dec 21 '20 at 15:44
0

To center the labels on a matplotlib histogram of discrete values is enough to define the "bins" as a list of bin boundaries.

import matplotlib.pyplot as plt
%matplotlib inline

example_data = [0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1]

fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot()
ax1_bars = [0,1]                           
ax1.hist( 
    example_data, 
    bins=[x for i in ax1_bars for x in (i-0.4,i+0.4)], 
    color='#404080')
ax1.set_xticks(ax1_bars)
ax1.set_xticklabels(['class 0 label','class 1 label'])
ax1.set_title("Example histogram")
ax1.set_yscale('log')
ax1.set_ylabel('quantity')

fig.tight_layout()
plt.show()

enter image description here

How this works?

  • The histogram bins parameter can be a list defining the boundaries of the bins. For a class that can assume the values 0 or 1, those boundaries should be [ -0.5, 0.5, 0.5, 1.5 ] which loosely translates as "bin 0" is from -0.5 to 1.5 and "bin 1" is from 0.5 to 1.5. Since the middle of those ranges are the discrete values the label will be on the expected place.

  • The expression [x for i in ax_bars for x in (i-0.4,i+0.4)] is just a way to generate the list of boundaries for a list of values (ax_bars).

  • The expression ax1.set_xticks(ax1_bars) is important to set the x axis to be discrete.

  • The rest should be self explanatory.

Lucas
  • 258
  • 2
  • 5
0

Use numpy to have bins centered at your requested values:

import matplotlib.pyplot as plt
import numpy as np
plt.hist(results, bins=np.arange(-0.5, 5))
plt.show()
michael
  • 371
  • 3
  • 12