You could try Cython. For me, this gives:
function usec per loop:
Python Cython
array_ctypes 1370 1220
array_struct 384 249
array_numpy 336 339
So Numpy only gives 15% benefit on my hardware (old laptop running WindowsXP), whereas Cython gives about 35% (without any extra dependency in your distributed code).
If you can loosen your requirement that each point is a tuple of floats, and simply make 'points' a flattened list of floats:
def array_struct_flat(points):
n = len(points)
return pack(
"f"*n,
*[
coord
for coord in points
]
)
points = [random() for _ in xrange(1000 * 2)]
then the resulting output is the same, but the timing goes down further:
function usec per loop:
Python Cython
array_struct_flat 157
Cython might be capable of substantially better than this too, if someone smarter than me wanted to add static type declarations to the code. (Running 'cython -a test.pyx' is invaluable for this, it produces an html file showing you where the slowest (yellow) plain Python is in your code, versus python that has been converted to pure C (white). That's why I spread the code above out onto so many lines, because the coloring is done per-line, so it helps to spread it out like that.)
Full Cython instructions are here:
http://docs.cython.org/src/quickstart/build.html
Cython might produce similar performance benefits across your whole codebase, and in ideal conditions, with proper static typing applied, can improve speed by factors of ten or a hundred.