3

I am trying to speed up a Python script. I have profiled the code and re-factored quite a lot already in pure Python. It seems that I am still spending a lot of time in accessing some numpy arrays in a way that looks like:

KeyArray[BoolArray[index]]

where KeyArray is ndim=2 and contains strings, BoolArray is ndim=1 and contains bool and index is an int.

I am trying to learn Cython to see how faster it could be. I wrote the following script that does not work:

import numpy as np
cimport numpy as np

def fastindexer(np.ndarray[np.str_t,ndim=1] KeyArray, np.ndarray [np.bool_t,ndim=2] BoolArray, np.int_t DateIndex):
    cdef np.ndarray[np.str_t,ndim=1] FArray = KeyArray[BoolArray[DateIndex]]
    return FArray

I understand that types str/bool are not available 'as is' in np arrays. I tried to cast as well but I don't understand how this should be written.

All help welcome

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
VincentH
  • 1,009
  • 4
  • 13
  • 24
  • @Joe thx for editing. Way easier to read – VincentH Aug 31 '15 at 13:46
  • 3
    For what it's worth, moving a single fancy indexing statement to Cython won't speed it up. It's already effectively all C. Instead of focusing on moving things to Cython, is there some way you can re-think your data structures? Can the way you're storing your data be refactored such that `KeyArray[BoolArray[index]]` becomes something more like `KeyArray[index]`? – Joe Kington Aug 31 '15 at 13:48

1 Answers1

1

As @Joe said, moving a single indexing statement to Cython won't give you speed. If you decide to move more of your program to Cython, you need to fix a number of problems.

1) You use def instead of cdef, limiting you to Python-only functionality.
2) You use the old buffer syntax. Read about memoryviews
3) Slicing a 2-D array is slow because a new memoryview is created each time. That said, it still is a lot faster than Python, but for peak performance you would have to use a different approach.

Heres something to get you started.

cpdef func():
   cdef int i
   cdef bool[:] my_bool_array = np.zeros(10, dtype=bool)
   # I'm not if this next line is correct 
   cdef char[:,:] my_string_array = np.chararray((10, 10))
   cdef char answer

   for i in range(10):
       answer = my_string_array[ my_bool_array[i] ]
rohanp
  • 610
  • 1
  • 8
  • 20