12

I'm trying to demonstrate to our group the virtues of Cython for enhancing Python performance. I have shown several benchmarks, all that attain speed up by just:

  1. Compiling the existing Python code.
  2. Using cdef to static type variables, particular in inner loops.

However, much of our code does string manipulation, and I have not been able to come up with good examples of optimizing code by typing Python strings.

An example I've tried is:

cdef str a
cdef int i,j
for j in range(1000000):
   a = str([chr(i) for i in range(127)])

but typing 'a' as a string actually makes the code run slower. I've read the documentation on 'Unicode and passing strings', but am confused about how it applies in the case I've shown. We don't use Unicode--everything is pure ASCII. We're using Python 2.7.2

Any advice is appreciated.

Veedrac
  • 58,273
  • 15
  • 112
  • 169
Paul Nelson
  • 1,291
  • 4
  • 13
  • 20
  • Is this Python 2? If so, why use `range`? – Mike Samuel Apr 14 '14 at 15:35
  • You mean, use xrange? – Paul Nelson Apr 14 '14 at 15:50
  • Really, what I meant for the last line of code is: ''.join([chr(i) for i in range(127)]) – Paul Nelson Apr 14 '14 at 15:58
  • I've noticed that declaring: cdef char* a is pretty fast, but that makes the a in the loop a temporary variable, so you have to re-assign within the loop to make it work. – Paul Nelson Apr 14 '14 at 16:17
  • is reassigning a problem? You'd only have to assign once without the benchmarking loop, right? – Mike Samuel Apr 14 '14 at 16:31
  • No, re-assigning is not a problem. Somewhere in the documentation about unicode, bytes, byte arrays, c_string_type, character arrays, encoding, decoding, Py_AsString, etc., I seem to be missing the basic question of: Given that I have strings in Python that are an integral part of my most time-consuming loops, what do I do to optimize them in Cython. – Paul Nelson Apr 14 '14 at 17:03
  • With string manipulation, performance often goes to some combination of unnecessary buffer copies, failure to pre-size buffers, index translation due to unnecessary random access, etc. It's hard to figure out which might be culprits when your benchmark is to generate a string. Maybe try a benchmark which does a left-to-right scan over a string and accumulates content onto a pre-sized output buffer. Maybe a NUL-filtering `tr` would be a good benchmark. – Mike Samuel Apr 14 '14 at 20:19

1 Answers1

15

I suggest you do your operations on cpython.array.arrays. The best documentation is the C API and the Cython source (see here).

from cpython cimport array

def cfuncA():
    cdef str a
    cdef int i,j
    for j in range(1000):
        a = ''.join([chr(i) for i in range(127)])

def cfuncB():
    cdef:
        str a
        array.array[char] arr, template = array.array('c')
        int i, j

    for j in range(1000):
        arr = array.clone(template, 127, False)

        for i in range(127):
            arr[i] = i

        a = arr.tostring()

Note that the operations required vary very much on what you do to your strings.

>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncA()"
100 loops, best of 3: 14.3 msec per loop

>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncB()"
1000 loops, best of 3: 512 usec per loop

So that's a 30x speed-up in this case.


Also, FWIW, you can take off another fair few µs by replacing arr.tostring() with arr.data.as_chars[:len(arr)] and typing a as bytes.

saladi
  • 3,103
  • 6
  • 36
  • 61
Veedrac
  • 58,273
  • 15
  • 112
  • 169