9

In my Python script I have floats that I want to bin. Right now I'm doing:

min_val = 0.0
max_val = 1.0
num_bins = 20
my_bins = numpy.linspace(min_val, max_val, num_bins)
hist,my_bins = numpy.histogram(myValues, bins=my_bins)

But now I want to add two more bins to account for values that are < 0.0 and for those that are > 1.0. One bin should thus include all values in ( -inf, 0), the other one all in [1, inf)

Is there any straightforward way to do this while still using numpy's histogram function?

Ricky Robinson
  • 21,798
  • 42
  • 129
  • 185

3 Answers3

11

The function numpy.histogram() happily accepts infinite values in the bins argument:

numpy.histogram(my_values, bins=numpy.r_[-numpy.inf, my_bins, numpy.inf])

Alternatively, you could use a combination of numpy.searchsorted() and numpy.bincount(), though I don't see much advantage to that approach.

jmetz
  • 12,144
  • 3
  • 30
  • 41
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • With matplotlib (`plt`), even though it uses numpy's `hist` internally, it does not accept the `inf` (drawing infinite boxes is too much? :-)). But a VERY large value (compared to the typical range of my data) worked well in my case. – Josiah Yoder Aug 03 '23 at 16:14
3

You can specify numpy.inf as the upper and -numpy.inf as the lower bin limits.

jmetz
  • 12,144
  • 3
  • 30
  • 41
0

With Numpy version 1.16 you have histogram_bin_edges. With this, todays solution calls histogram_bin_edges to get the bins, concatenate -inf and +inf and pass this as bins to histogram:

a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a), [np.PINF])))

Results in:

(array([0, 1, 3, 0, 4, 0, 4, 1, 0, 1, 0, 2]),
array([-inf,  1. ,  1.6,  2.2,  2.8,  3.4,  4. ,  4.6,  5.2,  5.8,  6.4, 7. ,  inf]))

if you prefer to have the last bin empty (as I do), you can use the range parameter and add a small number to max:

a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a, range=(np.min(a), np.max(a)+.1)), [np.PINF])))

Results in:

(array([0, 1, 3, 0, 4, 4, 0, 1, 0, 1, 2, 0]),
array([-inf, 1.  , 1.61, 2.22, 2.83, 3.44, 4.05, 4.66, 5.27, 5.88, 6.49, 7.1 ,  inf]))
jboi
  • 11,324
  • 4
  • 36
  • 43