0

I have a sorted list and I want to group the value in it by its max value and min value with a given number of buckets of equal width, e.g

list = [-1.8, -1.7, -1.3, 0.6, 2.7, 3.1, 3.2]

after grouping (with 5 buckets)

[[-1.8, -1.7, -1.3], [], [0.6], [2.7], [3.1, 3.2]]

(result in list is not necessary, it can be any convenient data structure). After some search I find a solution, by using bisect:

import bisect

min = list[0]
max = list[-1]
seperator = [x for x in np.linespace(min, max, 6)]
grouped = [[] for _ in range(5)]
for x in list[:-1]:
    idx = bisect.bisect_right(seperator, x)
    grouped[idx-1].append(x)
grouped[-1].append(max)

Later I found there are some more straight way to almost achieve this goal, like, numpy.histogram:

hist, bins = numpy.histogram(list, bins=5)

or pandas.cut, but still there is a small gap to the result I want, i.e, to get the data in each bins, and all I find close to that is this, which I think is essentially same as bisect.

So, is there a clear way to fill this gap?

chapayev
  • 15
  • 1
  • 4
  • If your min/max is separate by 5, shouldn't you expect each bin to be separated by 1, hence having your final bin containing anything from 2.2 to 3.2 - which is what numpy and/or pandas is going to do. – Chris Jun 30 '22 at 15:31
  • Do you really have to use `numpy` or just pure `Python`? – Daniel Hao Jun 30 '22 at 15:57

0 Answers0