12

I am looking for a fast formulation to do a numerical binning of a 2D numpy array. By binning I mean calculate submatrix averages or cumulative values. For ex. x = numpy.arange(16).reshape(4, 4) would have been splitted in 4 submatrix of 2x2 each and gives numpy.array([[2.5,4.5],[10.5,12.5]]) where 2.5=numpy.average([0,1,4,5]) etc...

How to perform such an operation in an efficient way... I don't have really any ideay how to perform this ...

Many thanks...

user1187727
  • 409
  • 2
  • 9
  • 19

3 Answers3

20

You can use a higher dimensional view of your array and take the average along the extra dimensions:

In [12]: a = np.arange(36).reshape(6, 6)

In [13]: a
Out[13]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [14]: a_view = a.reshape(3, 2, 3, 2)

In [15]: a_view.mean(axis=3).mean(axis=1)
Out[15]: 
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5],
       [ 27.5,  29.5,  31.5]])

In general, if you want bins of shape (a, b) for an array of (rows, cols), your reshaping of it should be .reshape(rows // a, a, cols // b, b). Note also that the order of the .mean is important, e.g. a_view.mean(axis=1).mean(axis=3) will raise an error, because a_view.mean(axis=1) only has three dimensions, although a_view.mean(axis=1).mean(axis=2) will work fine, but it makes it harder to understand what is going on.

As is, the above code only works if you can fit an integer number of bins inside your array, i.e. if a divides rows and b divides cols. There are ways to deal with other cases, but you will have to define the behavior you want then.

Jaime
  • 65,696
  • 17
  • 124
  • 159
  • 5
    On numpy 1.7. you can squash it together into `.mean(axis=(1,3))`! – seberg Feb 17 '13 at 01:19
  • 1
    I didn't know this reshaping would be possible, great! Unfortunatly the average is ordered dependent therefore how to get the average of for ex. a subatrix of 2,2 in you example (I mean the corner 0,1,6,7 etc...) ? – user1187727 Feb 17 '13 at 15:29
  • 1
    @user1187727 I don't think I am understanding your question, but the average of `[[0, 1], [6, 7]]` is item `[0, 0]` of `a_view.mean(axis=3).mean(axis=1)`. – Jaime Feb 17 '13 at 15:36
  • I think @user1187727 was confused with the fact that the reshaped array (with `reshape(3, 2, 3, 2)`) is *not* the array of 2x2 submatrices, though the result is anyway correct. – shaman.sir Jul 19 '17 at 18:45
1

See the SciPy Cookbook on rebinning, which provides this snippet:

def rebin(a, *args):
    '''rebin ndarray data into a smaller ndarray of the same rank whose dimensions
    are factors of the original dimensions. eg. An array with 6 columns and 4 rows
    can be reduced to have 6,3,2 or 1 columns and 4,2 or 1 rows.
    example usages:
    >>> a=rand(6,4); b=rebin(a,3,2)
    >>> a=rand(6); b=rebin(a,2)
    '''
    shape = a.shape
    lenShape = len(shape)
    factor = asarray(shape)/asarray(args)
    evList = ['a.reshape('] + \
             ['args[%d],factor[%d],'%(i,i) for i in range(lenShape)] + \
             [')'] + ['.sum(%d)'%(i+1) for i in range(lenShape)] + \
             ['/factor[%d]'%i for i in range(lenShape)]
    print ''.join(evList)
    return eval(''.join(evList))
Paul Price
  • 2,657
  • 30
  • 26
0

I assume that you only want to know how to generally build a function that performs well and does something with arrays, just like numpy.reshape in your example. So if performance really matters and you're already using numpy, you can write your own C code for that, like numpy does. For example, the implementation of arange is completely in C. Almost everything with numpy which matters in terms of performance is implemented in C.

However, before doing so you should try to implement the code in python and see if the performance is good enough. Try do make the python code as efficient as possible. If it still doesn't suit your performance needs, go the C way.

You may read about that in the docs.

nemo
  • 55,207
  • 13
  • 135
  • 135