This solution seems to provide the behavior you want:
# dummy.pyx
# cython: language_level = 3
cimport cython
cdef class MyObject:
cdef int value_
def __init__(self, value = 0):
print("Constructor called.")
self.value_ = value
cpdef int value(self):
return self.value_
@cython.boundscheck(False)
cpdef int get_sum_1(MyObject[::1] my_vect):
cdef int sum_ = 0
cdef Py_ssize_t len = my_vect.shape[0]
for i in range(len):
sum_ += (<MyObject>my_vect[i]).value()
return sum_
cpdef int get_sum_2(MyObject[::1] my_vect):
cdef int sum_ = 0
# cdef MyObject obj
for obj in my_vect:
sum_ += obj.value()
return sum_
get_sum_2
, above, uses for obj in my_vect
style (i.e., iterators). However, because the type of obj
is not known in advance and iterators are used (the range of iteration is not known a priori), there is a lot of Python overhead. If you hint Cython regarding the type of obj
in get_sum_2
by uncommenting the cdef MyObject obj
part, you get around 30-40% speedup (see below for an application code).
You can get two orders of magnitude speedup on top of get_sum_2
by using a plain old for
-loop instead of the iterators. Now, because we know how many elements the typed memory view has in advance, we can also turn off bounds checking.
The application code I have tried is as follows:
# app.py
from numpy import array, median
from timeit import repeat
import pyximport
pyximport.install()
from dummy import MyObject
from dummy import get_sum_1
from dummy import get_sum_2
my_vect = array([MyObject(i) for i in range(50000)])
get_1 = repeat("get_sum_1(my_vect)", repeat=100, number=1, globals=globals())
get_2 = repeat("get_sum_2(my_vect)", repeat=100, number=1, globals=globals())
print(f"Median of get_1: {1000*median(get_1)} ms.")
print(f"Median of get_2: {1000*median(get_2)} ms.")
Running python app.py
, I receive 50,000 "Constructor called." print statements, followed by the performance measures on my laptop:
Median of get_1: 0.20261999452486634 ms.
Median of get_2: 11.251458498009015 ms.
If you run cython --annotate dummy.pyx
, you should see the overheads clearly. In both of the examples, however, I do not see anything but struct MyObject *
in the generated C code. This is further supported by the lack of print statements of the constructor in the application code inside the respective function calls.