I have a sorted list and I want to group the value in it by its max value and min value with a given number of buckets of equal width, e.g
list = [-1.8, -1.7, -1.3, 0.6, 2.7, 3.1, 3.2]
after grouping (with 5 buckets)
[[-1.8, -1.7, -1.3], [], [0.6], [2.7], [3.1, 3.2]]
(result in list is not necessary, it can be any convenient data structure). After some search I find a solution, by using bisect
:
import bisect
min = list[0]
max = list[-1]
seperator = [x for x in np.linespace(min, max, 6)]
grouped = [[] for _ in range(5)]
for x in list[:-1]:
idx = bisect.bisect_right(seperator, x)
grouped[idx-1].append(x)
grouped[-1].append(max)
Later I found there are some more straight way to almost achieve this goal, like, numpy.histogram
:
hist, bins = numpy.histogram(list, bins=5)
or pandas.cut
, but still there is a small gap to the result I want, i.e, to get the data in each bins, and all I find close to that is this, which I think is essentially same as bisect
.
So, is there a clear way to fill this gap?