5

There are two outputs to numpy.histogram:

  • hist: values of the histogram
  • bin_edges: Return the bin edges (length(hist)+1)

both are vectors but in the example below, the second vector is of length 101, which is 1 higher than the first vector, which is length 100 :

import numpy as np
from numpy.random import rand, randn

n = 100  # number of bins
X = randn(n)*.1
a,bins1 = np.histogram(X,bins=n)

The following shape error occurs if I then try plt.plot(bins1,a):

ValueError: x and y must have same first dimension, but have shapes (101,) and (100,)

Why, and how do I fix the inequal shape error so I can plot the histogram?

develarist
  • 1,224
  • 1
  • 13
  • 34
  • Why are you not just using `plt.hist`? – BigBen Nov 02 '20 at 18:17
  • Because i need to keep the data for normalization to probabilities of 1, not just print the graph – develarist Nov 03 '20 at 03:26
  • `pyplot.hist` returns the values of the histogram bins. – BigBen Nov 03 '20 at 03:29
  • @BigBen great, but how do I *transform* the "values of the histogram bins" into normalized probabilities that have a maximum of 1 after, and *plot* those transformed values after as a new bar histogram without running into a shape mismatch error with the unchanged `bin_edges` vector? some more discussion in the comments here stackoverflow.com/questions/64648202/… – develarist Nov 03 '20 at 05:43
  • Is what you want to do is call `plt.hist()` with `density=True`? – Timo Nov 03 '20 at 08:03
  • @Timo `density` does not alter the histogram whatsoever as was discussed in the comments here, where someone says "No do not do that": https://stackoverflow.com/questions/64648202/how-to-align-two-numpy-histograms-so-that-they-share-the-same-bins-index-and-al/64649670#64649670 – develarist Nov 03 '20 at 08:11

2 Answers2

2

The unequal shapes occur because bin_edges, as the name implies, specifies the bin edges. Since a bin has left and right edge, bin_edges will have be of length len(bins)+1.

As already denoted in the comments, an appropriate way to plot is plt.hist

Timo
  • 493
  • 4
  • 8
0

I had this question as well because I wanted to transform my data before doing a histogram but display the results un-transformed (eg. just keep the autogenerated bin edges). The other answers here got you most of the way but what I found was useful was to do something like this:

h, bin_edges = np.histogram(np.log(X), bins=100)
plt.hist(X, bins=np.exp(bin_edges))

Of course, you could do this manually by just choosing your bin edges originally and passing them in to plt.hist without using np.histogram. But this was nice as the automated calculations simplified some things for me.

Eric C.
  • 376
  • 3
  • 16