Python convert Continuous data into categorial

Question

I have a continuous floating point data, ranging from -257.2 to 154.98, I have no idea how it is distributed. But I would want it to be in the bins - say -270 to -201, -200 to -141, -140 to -71, -70 to -1, 0 to 69, 70 to 139, 140 to 209

Is there a way to do this?, Specifically, I am looking out for:

data = np.random.rand(10)
data
array([ 0.58791019,  0.2385624 ,  0.70927668,  0.22916244,  0.87479326,
        0.49609703,  0.3758358 ,  0.35743165,  0.30816457,  0.2018548 ])
def GenRangedData(data, min, max, step):
    #some code
    no_of_bins = (max - min)/ step
    bins = []
    #some code
    return bins

rd = GenRangedData(data, 0, 1, 0.1)
# should generate: 
rd
[[], [0.2385624, 0.22916244, 0.2018548], [0.3758358, 0.35743165, 0.30816457], [0.49609703], [0.58791019], [], [0.70927668], [0.87479326]]

I can obviously do this by manually iterating over all the numbers, but I am looking to automate it, so that min max and step can be experimented a lot. Is there a way to do this efficiently?

Not sure what you're asking. So you don't want to loop over `data` and assign each item to the corresponding bin inside `GenRangedDate()`? What are you hoping to achieve if not so? — yelsayed, May 10 '16 at 06:44
I am looking for a library function to do so, iterating list by myself does not seem efficient.. — Adorn, May 10 '16 at 06:51
it's an `O(n)` operation, no built-in function can make this faster, you just _have_ to check all values. — yelsayed, May 10 '16 at 06:55
If your data is already sorted, it's possible you could do it faster using binary search. — Brendan Abel, May 10 '16 at 07:24
`np.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)` — If `bins` is a sequence of numbers, they are the edges of the bins in which your data (the `a` array) is classified. — gboffi, May 10 '16 at 11:11

score 0 · Accepted Answer · answered May 10 '16 at 10:25

This is what I could come up with, I do not know if this is the best way, If you think this can be done faster, pl update/edit

def GenRangedData(data, min, max, step):
    cat_data = []
    bins = ((i_max - i_min) / step) + 2
    for x in range(0, len(data)):
        temp_data = []
        for y in range(0, len(data[x])):
            for n in range(0, int(bins)):
                if data[x][y] < (i_min + (n*step)):
                    temp_data.append(n)
                    break
    cat_data.append(temp_data)

Python convert Continuous data into categorial

1 Answers1