What is the fastest way in python to build a c array from a list of tuples of floats?

Question

The context: my Python code pass arrays of 2D vertices to OpenGL.

I tested 2 approaches, one with ctypes, the other with struct, the latter being more than twice faster.

from random import random
points = [(random(), random()) for _ in xrange(1000)]

from ctypes import c_float
def array_ctypes(points):
    n = len(points)
    return n, (c_float*(2*n))(*[u for point in points for u in point])

from struct import pack
def array_struct(points):
    n = len(points)
    return n, pack("f"*2*n, *[u for point in points for u in point])

Any other alternative? Any hint on how to accelerate such code (and yes, this is one bottleneck of my code)?

I cross-posted this question to newsgroup gmane.comp.python.opengl.user too, which returned similar answers as below. — Jonathan Hartley, Nov 21 '10 at 10:35

score 3 · Answer 1 · answered Nov 11 '10 at 17:37

3

You can pass numpy arrays to PyOpenGL without incurring any overhead. (The data attribute of the numpy array is a buffer that points to the underlying C data structure that contains the same information as the array you're building)

import numpy as np  
def array_numpy(points):
    n = len(points)
    return n, np.array(points, dtype=np.float32)

On my computer, this is about 40% faster than the struct-based approach.

answered Nov 11 '10 at 17:37

Ray

4,531
1
23
32

Impressive! I did not want to add the numpy dependency to my code, but it looks like it is worth it. (side note: not specifying the dtype parameter kills the perf by a factor 10) – rndblnch Nov 11 '10 at 17:57
Can this technique be improved further, by creating the numpy array up-front, and then just updating elements as required every frame. I'm imagining situations where vertices would mostly be static, but sometime a portion of them would need updating for animations. – Jonathan Hartley Nov 21 '10 at 11:03
You might also get additional benefits from using numpy to manipulate the arrays once they exist. e.g. you could add an array of velocities to an array of positions. This might be especially good for things like particle systems, where your Python code doesn't need frequent access to the value of the resulting positions. – Jonathan Hartley Nov 23 '10 at 15:58

Jonathan Hartley · Accepted Answer · 2010-11-23T15:59:24.823

You could try Cython. For me, this gives:

function       usec per loop:
               Python  Cython
array_ctypes   1370    1220
array_struct    384     249
array_numpy     336     339

So Numpy only gives 15% benefit on my hardware (old laptop running WindowsXP), whereas Cython gives about 35% (without any extra dependency in your distributed code).

If you can loosen your requirement that each point is a tuple of floats, and simply make 'points' a flattened list of floats:

def array_struct_flat(points):
    n = len(points)
    return pack(
        "f"*n,
        *[
            coord
            for coord in points
        ]
    )

points = [random() for _ in xrange(1000 * 2)]

then the resulting output is the same, but the timing goes down further:

function            usec per loop:
                    Python  Cython
array_struct_flat           157

Cython might be capable of substantially better than this too, if someone smarter than me wanted to add static type declarations to the code. (Running 'cython -a test.pyx' is invaluable for this, it produces an html file showing you where the slowest (yellow) plain Python is in your code, versus python that has been converted to pure C (white). That's why I spread the code above out onto so many lines, because the coloring is done per-line, so it helps to spread it out like that.)

Full Cython instructions are here: http://docs.cython.org/src/quickstart/build.html

Cython might produce similar performance benefits across your whole codebase, and in ideal conditions, with proper static typing applied, can improve speed by factors of ten or a hundred.

Daniel Lemire · Answer 3 · 2016-09-06T20:21:43.780

1

If performance is an issue, you do not want to use ctypes arrays with the star operation (e.g., (ctypes.c_float * size)(*t)).

In my test packis fastest followed by the use of the array module with a cast of the address (or using the from_buffer function).

import timeit
repeat = 100
setup="from struct import pack; from random import random; import numpy;  from array import array; import ctypes; t = [random() for _ in range(2* 1000)];"
print(timeit.timeit(stmt="v = array('f',t); addr, count = v.buffer_info();x = ctypes.cast(addr,ctypes.POINTER(ctypes.c_float))",setup=setup,number=repeat))
print(timeit.timeit(stmt="v = array('f',t);a = (ctypes.c_float * len(v)).from_buffer(v)",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(*t)',setup=setup,number=repeat))
print(timeit.timeit(stmt="x = pack('f'*len(t), *t);",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(); x[:] = t',setup=setup,number=repeat))
print(timeit.timeit(stmt='x = numpy.array(t,numpy.float32).data',setup=setup,number=repeat))

The array.array approach is slightly faster than Jonathan Hartley's approach in my test while the numpy approach has about half the speed:

python3 convert.py
0.004665990360081196
0.004661010578274727
0.026358536444604397
0.0028003649786114693
0.005843495950102806
0.009067213162779808

The net winner is pack.

edited Sep 06 '16 at 20:21

answered Aug 30 '16 at 15:44

Daniel Lemire

3,470
2
25
23

Fabulous. Reproducible comparative measurements - best answer on the page. – Jonathan Hartley Aug 30 '16 at 16:24
With timeit(number=10), I see a fair amount of variation in order of the timings. I had to increate it to 1000 before they settled down into a fairly consistent order. – Jonathan Hartley Aug 30 '16 at 20:14
Using Daniel's script, I also measure pack to be amongst the fastest entry. However, I'm curious because the `*t` syntax in the call to 'pack' example means that the list 't' is being unpacked into a tuple, for the args of 'pack'. This sounds like there is still some innefficiency here, so could possibly be improved upon. – Jonathan Hartley Aug 30 '16 at 20:15
With many of the snippets in Daniel's brilliant script, it's possible to shave a little extra time off by extracting as much as possible into the 'setup' portion. For example, creating format strings `'f' * len(floats)` can be done in setup. Taking this further, some of the entries can also be improved by allocating the C array in the setup, and only populating it later. – Jonathan Hartley Aug 30 '16 at 20:19
A new entry, which for me takes less than 66% the time of any other entry: `SETUP="import cffi; import random; floats = [random.random() for _ in range(2 * {0})]; ffi = cffi.FFI(); x = ffi.new('float[]', len(floats))`, and then the code to populate the array is `x[0:2000] = floats` – Jonathan Hartley Aug 30 '16 at 20:21

Jonathan Hartley · Answer 4 · 2010-11-24T16:23:44.327

There's another idea I stumbled across. I don't have time to profile it right now, but in case someone else does:

 # untested, but I'm fairly confident it runs
 # using 'flattened points' list, i.e. a list of n*2 floats
 points = [random() for _ in xrange(1000 * 2)]
 c_array = c_float * len(points * 2)
 c_array[:] = points

That is, first we create the ctypes array but don't populate it. Then we populate it using the slice notation. People smarter than I tell me that assigning to a slice like this may help performance. It allows us to pass a list or iterable directly on the RHS of the assignment, without having to use the *iterable syntax, which would perform some intermediate wrangling of the iterable. I suspect that this is what happens in the depths of creating pyglet's Batches.

Presumably you could just create c_array once, then just reassign to it (the final line in the above code) every time the points list changes.

There is probably an alternative formulation which accepts the original definition of points (a list of (x,y) tuples.) Something like this:

 # very untested, likely contains errors
 # using a list of n tuples of two floats
 points = [(random(), random()) for _ in xrange(1000)]
 c_array = c_float * len(points * 2)
 c_array[:] = chain(p for p in points)

Thanks for the feedback @DanielLemire. Out of interest, did you try both suggested approaches from this answer? — Jonathan Hartley, Aug 30 '16 at 15:41

score 0 · Answer 5 · answered Nov 11 '10 at 17:05

0

You can use array (notice also the generator expression instead of the list comprehension):

array("f", (u for point in points for u in point)).tostring()

Another optimization would be to keep the points flattened from the beginning.

answered Nov 11 '10 at 17:05

Toni Ruža

7,462
2
28
31

I tried generators in my first attempts, and it turns out that it slows down the functions. – rndblnch Nov 11 '10 at 17:26
(and it also slows down this array version). by the way, even with list comprehension, the array based solution is still 20% slower than the struct version... – rndblnch Nov 11 '10 at 17:30

What is the fastest way in python to build a c array from a list of tuples of floats?

5 Answers5