19

I have been trying to work with Cython and I encountered the following peculiar scenario where a sum function over an array takes 3 times the amount of time that the average of an array takes.

Here are my three functions

cpdef FLOAT_t cython_sum(cnp.ndarray[FLOAT_t, ndim=1] A):
   cdef double [:] x = A
   cdef double sum = 0
   cdef unsigned int N = A.shape[0]
   for i in xrange(N):
     sum += x[i]
   return sum

cpdef FLOAT_t cython_avg(cnp.ndarray[FLOAT_t, ndim=1] A):
   cdef double [:] x = A
   cdef double sum = 0
   cdef unsigned int N = A.shape[0]
   for i in xrange(N):
     sum += x[i]
   return sum/N


cpdef FLOAT_t cython_silly_avg(cnp.ndarray[FLOAT_t, ndim=1] A):
   cdef unsigned int N = A.shape[0]
   return cython_avg(A)*N

Here are the run times in ipython

In [7]: A = np.random.random(1000000)


In [8]: %timeit np.sum(A)   
1000 loops, best of 3: 906 us per loop

In [9]: %timeit np.mean(A)
1000 loops, best of 3: 919 us per loop

In [10]: %timeit cython_avg(A)
1000 loops, best of 3: 896 us per loop

In [11]: %timeit cython_sum(A)
100 loops, best of 3: 2.72 ms per loop

In [12]: %timeit cython_silly_avg(A)
1000 loops, best of 3: 862 us per loop

I am unable to account for the memory jump in simple cython_sum. Is it because of some memory allocation? Since these are random nos from 0 to 1. The sum is around 500K.

Since line_profiler doesn't work with cython, I was unable to profile my code.

Richie Abraham
  • 191
  • 1
  • 3
  • 3
    On a side note don't use sum as a variable name, it shadows the builtin sum method. – Padraic Cunningham Jun 17 '14 at 18:58
  • 2
    This is definitely a strange one. I tried essentially making the two methods contain the same exact code, and I notice an appreciable difference, despite both generating what appears to be the same c code. I also tried switching the order of the method definitions, and this doesn't seem to change anything. However if I replace the buffer protocol definition of A in the call signature with a typed memoryview (instead of doing it in the method), then the two produce the same timings. – JoshAdel Jun 17 '14 at 19:14
  • 1
    I used double instead of FLOAT_t and sum took 1.17 ms avg was 1.16ms – Padraic Cunningham Jun 17 '14 at 19:21
  • I have figured out that this happens on when i pass float64 arrays. The times are the same if I pass an integer array. – Richie Abraham Jun 17 '14 at 19:22
  • So JoshAdel it means that you passed double [:] to the function? The funny thing is in the above function If i pass an extra argument and just divide sum by it, then the times are the same again. So i can just pas cython_Sum(A,1) and the memory jump goes away – Richie Abraham Jun 17 '14 at 19:24
  • @RichieAbraham Have you tried compiling the c code with different optimisation flags? Does that make a difference? Also, can you not just profile the c code that is created? – will Jun 18 '14 at 01:00
  • 1
    I am using -O3 optimisation flag. I haven't profiled the c code created, but since the problem disappears when I use a typed memoryview. – Richie Abraham Jun 18 '14 at 14:01
  • I do not see any difference in performance on memory usage for avg and sum. – pv. Jun 26 '14 at 10:32
  • 5
    I can't replicate your issue. cython_avg is just as fast as python_sum for me. see http://nbviewer.ipython.org/gist/nbren12/1b356745aa851d73342f – nbren12 Jun 27 '14 at 18:41
  • 2
    timeit ran a different number of iterations for each test, which can skew the results significantly if there is a long warm-up for the first run! I'm guessing ipython is compiling the Cython code on the first run. Try running your benchmarks again with an identical number of iterations, and/or compile the code in advance by calling each function once before using timeit. – taleinat Jul 15 '14 at 17:15
  • Which versions of cython and numpy are you using? Which version of gcc? – ali_m Jul 30 '14 at 19:16

1 Answers1

1

It seems like the results from @nbren12 are the definite answer: these results cannot be reproduced.

The evidence (and logic) point out that both methods have the same runtime.

rodrigob
  • 2,891
  • 3
  • 30
  • 34