3

I am trying to speed up my Python code with Cython, and so far it is working great. I am having however one single problem: dealing with lists.

Using cython -a myscript.pyx, I can see that the only parts of my code that call Python routines are when I'm dealing with lists.

For example, I have a numpy array (sel1) that I need to split like this:

x1 = numpy.array([t[0] for t in sel1])
y1 = numpy.array([t[1] for t in sel1])
z1 = numpy.array([t[2] for t in sel1])

and I have no idea how to speed this up with Cython.

Another occurence is when using list/array indexes, like this:

cdef numpy.ndarray[DTYPE_t, ndim=2] init_value_1 = coords_1[0], init_value_2 = coords_2[0]

I am aware that what takes time is the Python routines that are used to access the parts of the lists I need. I currently have no idea how to speed this up though.

ali_m
  • 71,714
  • 23
  • 223
  • 298
Marlon
  • 47
  • 2
  • 7

1 Answers1

4

Manipulating lists in Cython is inherently more expensive than using numpy arrays or typed memoryviews, since the former necessitates making Python API calls, whereas with the latter it's possible to directly address the underlying C memory buffers. The best way to avoid this overhead is to simply not use lists wherever possible.

You shouldn't really be using list comprehensions to split your sel1 array anyway - it will be much faster to simply index into the columns:

x1 = sel1[:, 0]
x2 = sel1[:, 1]
x3 = sel1[:, 2]

Creating new numpy arrays in Cython will always incur some Python overhead, since they are allocated on the Python heap and accounted for by Python's memory management system. That line might be more expensive than it needs to be if coords1 or coords2 is a list or tuple rather than a numpy array.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • Thanks for your answer, that sped things up nicely. If I hear you correctly, I should also stop using lists (with the append method) and instead create new arrays and filling them, is that right ? – Marlon Mar 10 '15 at 12:51
  • Yes, that's definitely advisable when writing Cython (and also good practice for standard Python/numpy code). In Cython you basically want to be using objects that can expose their underlying memory buffers at the level of C code (either numpy arrays or [typed memoryviews](http://docs.cython.org/src/userguide/memoryviews.html#memory-layout)) wherever possible. – ali_m Mar 10 '15 at 13:50