I want to use Cython to decrease the time it takes to add two arrays together (element-wise) without using Numpy arrays. The basic Python approach that I found to be the fastest is to use list comprehension, as follows:
def add_arrays(a,b):
return [m + n for m,n in zip(a,b)]
My Cython approach is a little more complicated and it looks as follows:
from array import array
from libc.stdlib cimport malloc
from cython cimport boundscheck,wraparound
@boundscheck(False)
@wraparound(False)
cpdef add_arrays_Cython(int[:] Aarr, int[:] Barr):
cdef size_t i, I
I = Aarr.shape[0]
cdef int *Carr = <int *> malloc(640000 * sizeof(int))
for i in range(I):
Carr[i] = Aarr[i]+Barr[i]
result_as_array = array('i',[e for e in Carr[:640000]])
return result_as_array
Note that I use @boundscheck(False)
and @wraparound(False)
to make it even faster.
Also, I am concerned about a very large array (size 640000) and I found it crashes if I simply use cdef int Carr[640000]
so I used malloc()
, which solved that problem. Lastly, I return the data structure as a Python array of type integer.
To profile the code I ran the following:
a = array.array('i', range(640000)) #create integer array
b = a[:] #array to add
T=time.clock()
for i in range(20): add_arrays(a,b) #Python list comprehension approach
print(time.clock() - T)
>6.33 seconds
T=time.clock()
for i in range(20): add_arrays_Cython(a,b) #Cython approach
print(time.clock() - T)
> 4.54 seconds
Evidently, the Cython-based approach gives a speed-up of about 30%. I expected that the speed-up would be closer to an order of magnitude or even more (like it does for Numpy).
What can I do to speed-up the Cython code further? Are there any obvious bottlenecks in my code? I am a beginner to Cython so I may be misunderstanding something.