How to optimize speed to calculate the mean along the Z axis in a 3D array? Cython vs Numpy

Question

I am trying to speed up the calculation of the mean values along the Z axis in a 3d array. I read the documentation of cython to add types, memory views and so on, to accomplish this task. However, when I compare both: the function based on numpy and the other based on cython syntax and compilation in .so file, the first beats the second one. Is there a step, or type declaration I am getting wrong/missing in my code?

This is my numpy version: python_mean.py

    import numpy as np


    def mean_py(array):
        x = array.shape[1]
        y = array.shape[2]
        values = []
        for i in range(x):
            for j in range(y):
                values.append((np.mean(array[:, i, j])))

        values = np.array([values])
        values = values.reshape(500,500)
        return values

and this is my cython_mean.pyx file

     %%cython
     from cython import wraparound, boundscheck
     import numpy as np
     cimport numpy as np 

     DTYPE = np.double

     @boundscheck(False)
     @wraparound(False)
     def cy_mean(double[:,:,:] array):
        cdef Py_ssize_t x_max = array.shape[1]
        cdef Py_ssize_t y_max = array.shape[2]
        cdef double[:,:] result = np.zeros([x_max, y_max], dtype = DTYPE)
        cdef double[:,:] result_view = result
        cdef Py_ssize_t i,j
        cdef double mean
        cdef list values 
        for i in range(x_max):
            for j in range(y_max):
                mean = np.mean(array[:,i,j])
                result_view[i,j] = mean
        return result

When I import both functions and start doing calculation on a 3D numpy array I got the following:

    import numpy as np
    a = np.random.randn(250_000)
    b = np.random.randn(250_000)
    c = np.random.randn(250_000)

    array = np.vstack((a,b,c)).reshape(3, 500, 500)

    import mean_py
    from mean_py import mean_py
    %timeit mean_py(array)


    4.82 s ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



    import cython_mean
    from cython_mean import cy_mean


    7.3 s ± 499 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Why such a low performance in the cython code? Thanks for your help

Cython is a generic solution to optimize Python codes. NumPy is a specific solution for math computing. So, for math computing, NumPy should win in most cases… — Laurent LAPORTE, May 04 '19 at 20:46
Always use %%cython -a too see what is really going on. The problem is using np.mean(). You could easily reach Numpy performance (very likely the numpy implementation is also written in Cython) if you write out the np.mean in a loop. — max9111, May 06 '19 at 09:35

score 4 · Accepted Answer · edited May 04 '19 at 21:55

Numpy solution

For this particular problem, using the axis parameter of numpy.mean likely is the fastest possible implementation (i.e. values = np.mean(array, axis=0)).

See benchmark below, numpy.mean appears nearly 1000x faster using your example.

In []: %timeit mean_py(array)
1.23 s ± 3.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In []: %timeit array.mean(0)
1.07 ms ± 3.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In []: np.all(array.mean(0) == mean_py(array))
Out[]: True

suggestion for your original approach

Not an explanation of why the cython version isn't faster, but a suggestion for how to improve the numpy-only version (avoiding list as a (slow) intermediate data structure) :

    import numpy as np


    def mean_py(array):
        x = array.shape[1]
        y = array.shape[2]
        #avoid creating values as list first
        #and create empty array instead
        values = np.empty((x,y), type(array[0][0][0]))
        for i in range(x):
            for j in range(y):
                #no more need for append operations
                values[i][j] = ((np.mean(array[:, i, j])))

        #no more need for conversion to array
        #values = np.array([values])
        #values = values.reshape(500,500)
        return values

Sadly I received the following Error message when Trying to assign the calculated mean value to the location in the array: `IndexError: index 500 is out of bounds for axis 0 with size 500 ` — Roger Almengor, May 04 '19 at 21:03

How to optimize speed to calculate the mean along the Z axis in a 3D array? Cython vs Numpy

1 Answers1

Numpy solution

suggestion for your original approach