12

Few weeks ago I asked a question on increasing the speed of a function written in Python. At that time, TryPyPy brought to my attention the possibility of using Cython for doing so. He also kindly gave an example of how I could Cythonize that code snippet. I want to do the same with the code below to see how fast I can make it by declaring variable types. I have a couple of questions related to that. I have seen the Tutorial on the cython.org, but I still have some questions. They are closely related:

  1. I don't know any C. What parts do I need to learn, to use Cython to declare variable types?
  2. What is the C type corresponding to python lists and tuples? For example, I can use double in Cython for float in Python. What do I do for lists? In general, where do I find the corresponding C type for a given Python type.

Any example of how I could Cythonize the code below would be really helpful. I have inserted comments in the code that give information about the variable type.

class Some_class(object):
    ** Other attributes and functions **
    def update_awareness_status(self, this_var, timePd):
        '''Inputs: this_var (type: float)
           timePd (type: int)
           Output: None'''

        max_number = len(self.possibilities)
        # self.possibilities is a list of tuples.
        # Each tuple is a pair of person objects. 

        k = int(math.ceil(0.3 * max_number))
        actual_number = random.choice(range(k))
        chosen_possibilities = random.sample(self.possibilities, 
                                         actual_number)
        if len(chosen_possibilities) > 0:
            # chosen_possibilities is a list of tuples, each tuple is a pair
            # of person objects. I have included the code for the Person class
            # below.
            for p1,p2 in chosen_possibilities:

                # awareness_status is a tuple (float, int)
                if p1.awareness_status[1] < p2.awareness_status[1]:                   
                    if p1.value > p2.awareness_status[0]:
                        p1.awareness_status = (this_var, timePd)
                    else:
                        p1.awareness_status = p2.awareness_status
                elif p1.awareness_status[1] > p2.awareness_status[1]:
                    if p2.value > p1.awareness_status[0]:
                        p2.awareness_status = (price, timePd)
                    else:
                        p2.awareness_status = p1.awareness_status
                else:
                    pass     

class Person(object):                                         
    def __init__(self,id, value):
        self.value = value
        self.id = id
        self.max_val = 50000
        ## Initial awareness status.          
        self.awarenessStatus = (self.max_val, -1)
Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
Curious2learn
  • 31,692
  • 43
  • 108
  • 125
  • 8
    Do you have a working pure Python code? Have you profiled its execution? Where is most time spent? – eat Feb 02 '11 at 10:04
  • The types for lists and tuples are just `list` and `tuple`. C only defines a few types, mostly numeric, so pretty much everything else just uses the same name as you would in Python. – kenm Feb 02 '11 at 14:33
  • @ eat: Yes, I profiled the code and found that most time is spent in the function above. Are you asking where inside this function most time is spent? The whole code takes 47 seconds to run, the above code function takes 22 seconds. It is accessed 79900 times. Thanks! – Curious2learn Feb 02 '11 at 15:49
  • @kwatford: Thanks. That is helpful. Is there a good reference which talks about things like these. I did not find this on the Cython site. – Curious2learn Feb 02 '11 at 15:50
  • @Curious2learn: Not that I've seen. For optimal use of Cython, you're probably going to need to understand C and some of how Python's C-API works. The rest of it mostly falls out of having that knowledge. I would assume the reason that some of this stuff isn't well documented is that the designers and most of the target audience are very familiar with it. Few designers would think it needed saying, and few users needed it to be said. If you think more documentation is needed, I'd bring it up on Cython's mailing list(s). – kenm Feb 02 '11 at 17:21
  • @kwatford: Thank you. It would be great if they have a section that tells people who know Python but don't know any C, where to start. Any recommendations on books or tutorials in C for those not aiming to become experts in C but interested in learning at least enough to use Cython well. Thanks! – Curious2learn Feb 02 '11 at 19:35
  • I know that there is going to be a sprint to work on Cython docs in the the next 6 months or so. Hopefully that will help a bit. How big is k generally? – Justin Peel Feb 02 '11 at 22:08
  • @Justin: `k` is about 220. It will perhaps increase later. I hope they add good examples with other python types (lists, lists of tuples etc.) to the new Cython documentation. – Curious2learn Feb 03 '11 at 02:15
  • As a general rule, if using a JIT or "compiled" (rather than interpreted) code considerably speeds up your program, it's written "for the compiler". There is often some algorithmic improvement you might be able to do that will speed up the code in this case. – Noufal Ibrahim Feb 17 '11 at 18:43

2 Answers2

7

As a general note, you can see exactly what C code Cython generates for every source line by running the cython command with the -a "annotate" option. See the Cython documentation for examples. This is extremely helpful when trying to find bottlenecks in a function's body.

Also, there's the concept of "early binding for speed" when Cython-ing your code. A Python object (like instances of your Person class below) use general Python code for attribute access, which is slow when in an inner loop. I suspect that if you change the Person class to a cdef class, then you will see some speedup. Also, you need to type the p1 and p2 objects in the inner loop.

Since your code has lots of Python calls (random.sample for example), you likely won't get huge speedups unless you find a way to put those lines into C, which takes a good amount of effort.

You can type things as a tuple or a list, but it doesn't often mean much of a speedup. Better to use C arrays when possible; something you'll have to look up.

I get a factor of 1.6 speedup with the trivial modifications below. Note that I had to change some things here and there to get it to compile.

ctypedef int ITYPE_t

cdef class CyPerson:
    # These attributes are placed in the extension type's C-struct, so C-level
    # access is _much_ faster.
    cdef ITYPE_t value, id, max_val
    cdef tuple awareness_status

    def __init__(self, ITYPE_t id, ITYPE_t value):
        # The __init__ function is much the same as before.
        self.value = value
        self.id = id
        self.max_val = 50000
        ## Initial awareness status.          
        self.awareness_status = (self.max_val, -1)

NPERSONS = 10000

import math
import random

class Some_class(object):

    def __init__(self):
        ri = lambda: random.randint(0, 10)
        self.possibilities = [(CyPerson(ri(), ri()), CyPerson(ri(), ri())) for i in range(NPERSONS)]

    def update_awareness_status(self, this_var, timePd):
        '''Inputs: this_var (type: float)
           timePd (type: int)
           Output: None'''

        cdef CyPerson p1, p2
        price = 10

        max_number = len(self.possibilities)
        # self.possibilities is a list of tuples.
        # Each tuple is a pair of person objects. 

        k = int(math.ceil(0.3 * max_number))
        actual_number = random.choice(range(k))
        chosen_possibilities = random.sample(self.possibilities,
                                         actual_number)
        if len(chosen_possibilities) > 0:
            # chosen_possibilities is a list of tuples, each tuple is a pair
            # of person objects. I have included the code for the Person class
            # below.
            for persons in chosen_possibilities:
                p1, p2 = persons
                # awareness_status is a tuple (float, int)
                if p1.awareness_status[1] < p2.awareness_status[1]:
                    if p1.value > p2.awareness_status[0]:
                        p1.awareness_status = (this_var, timePd)
                    else:
                        p1.awareness_status = p2.awareness_status
                elif p1.awareness_status[1] > p2.awareness_status[1]:
                    if p2.value > p1.awareness_status[0]:
                        p2.awareness_status = (price, timePd)
                    else:
                        p2.awareness_status = p1.awareness_status
lothario
  • 1,788
  • 1
  • 14
  • 8
1

C does not directly know the concept of lists. The basic data types are int (char, short, long), float/double (all of which have pretty straightforward mappings to python) and pointers. If the concept of pointers is new to you, have a look at: Wikipedia:Pointers

Pointers can then be used as tuple/array replacements in some cases. Pointers of chars are the base for all strings. Say you have an array of integers, you would then store it in as a continuous chunk of memory with a start address, you define the type (int) and that it’s a pointer (*):

cdef int * array;

Now you can access each element of the array like this:

array[0] = 1

However, memory has to be allocated (e.g. using malloc) and advanced indexing will not work (e.g. array[-1] will be random data in memory, this also hold for indexes exceeding the width of the reserved space).

More complex types don't directly map to C, but often there is a C way to do something that might not require the python types (e.g. a for loop does not need a range array/iterator).

As you noticed yourself, writing good cython code requires more detailed knowledge of C, so heading forward to a tutorial is probably the best next step.

rumpel
  • 7,870
  • 2
  • 38
  • 39
  • @ rumpel: Thanks for the response. I am going to start learning some C. I hope that the fact that there do not exist types in C that are equivalent to python tuples and lists, does not mean that when I change function `blah` to Cython, I also have to change other functions that output data used as input by `blah`. – Curious2learn Feb 03 '11 at 00:35
  • @Curious No typically not. You can often access information of the python types in C, or if this is still not fast enough, it sometimes makes sense to convert the data-types before and after processing by the cython function. – rumpel Feb 03 '11 at 08:21