0

I have a continuous floating point data, ranging from -257.2 to 154.98, I have no idea how it is distributed. But I would want it to be in the bins - say -270 to -201, -200 to -141, -140 to -71, -70 to -1, 0 to 69, 70 to 139, 140 to 209

Is there a way to do this?, Specifically, I am looking out for:

data = np.random.rand(10)
data
array([ 0.58791019,  0.2385624 ,  0.70927668,  0.22916244,  0.87479326,
        0.49609703,  0.3758358 ,  0.35743165,  0.30816457,  0.2018548 ])
def GenRangedData(data, min, max, step):
    #some code
    no_of_bins = (max - min)/ step
    bins = []
    #some code
    return bins

rd = GenRangedData(data, 0, 1, 0.1)
# should generate: 
rd
[[], [0.2385624, 0.22916244, 0.2018548], [0.3758358, 0.35743165, 0.30816457], [0.49609703], [0.58791019], [], [0.70927668], [0.87479326]]

I can obviously do this by manually iterating over all the numbers, but I am looking to automate it, so that min max and step can be experimented a lot. Is there a way to do this efficiently?

Adorn
  • 1,403
  • 1
  • 23
  • 46
  • Not sure what you're asking. So you don't want to loop over `data` and assign each item to the corresponding bin inside `GenRangedDate()`? What are you hoping to achieve if not so? – yelsayed May 10 '16 at 06:44
  • I am looking for a library function to do so, iterating list by myself does not seem efficient.. – Adorn May 10 '16 at 06:51
  • it's an `O(n)` operation, no built-in function can make this faster, you just _have_ to check all values. – yelsayed May 10 '16 at 06:55
  • If your data is already sorted, it's possible you could do it faster using binary search. – Brendan Abel May 10 '16 at 07:24
  • `np.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)` — If `bins` is a sequence of numbers, they are the edges of the bins in which your data (the `a` array) is classified. – gboffi May 10 '16 at 11:11

1 Answers1

0

This is what I could come up with, I do not know if this is the best way, If you think this can be done faster, pl update/edit

def GenRangedData(data, min, max, step):
    cat_data = []
    bins = ((i_max - i_min) / step) + 2
    for x in range(0, len(data)):
        temp_data = []
        for y in range(0, len(data[x])):
            for n in range(0, int(bins)):
                if data[x][y] < (i_min + (n*step)):
                    temp_data.append(n)
                    break
    cat_data.append(temp_data)
Adorn
  • 1,403
  • 1
  • 23
  • 46