2

How can I get cython to generate something like this C++ loop?

int get_sum(const std::vector<MyObject>& my_vect) {
  int sum=0;
  for(const auto& my_obj : my_vect) {
    sum += my_obj.value();
  }
  return sum;
}

When I try writing this in Cython using for my_obj in my_vect the code it generates creates an default-constructed MyObject, then iterates over my_vect and does a copy-assignment to that object.

How can I convince Cython to use references in the for loop instead of doing copies?

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
Thomas Johnson
  • 10,776
  • 18
  • 60
  • 98
  • Have you turned on your compilers optimizer yet? – Jesper Juhl Aug 21 '19 at 17:26
  • The optimizer shouldn't matter. Calling an object's operator= is semantically different from working with a reference to the object. – Thomas Johnson Aug 21 '19 at 17:36
  • It matters a lot if OP is looking at a debug build. An optimized build may well perform orders of magnitude better. – Jesper Juhl Aug 21 '19 at 17:39
  • It's not just a question of performance though. Suppose MyObject's operator= has side-effects (e.g., logging). That side-effect would be triggered in the generated Cython code, but not in the range-based for loop that uses references. The generated Cython code that makes a copy and the range-based for actually do different things. – Thomas Johnson Aug 21 '19 at 17:41
  • That's also true. – Jesper Juhl Aug 21 '19 at 18:19
  • Realistically you can't. Cython uses values rather than references to match Python scoping rules (so `my_obj` is accessible after the loop). If you need to generate specific C++ code then you're much better just writing the code in C++. – DavidW Aug 22 '19 at 08:53
  • Can you provide a Cython example? Or, can you show where it creates a default constructed `MyObject`? I seem to get a normal behavior using typed memory views. – Arda Aytekin Aug 23 '19 at 10:04

1 Answers1

0

This solution seems to provide the behavior you want:

# dummy.pyx
# cython: language_level = 3
cimport cython

cdef class MyObject:
  cdef int value_

  def __init__(self, value = 0):
    print("Constructor called.")
    self.value_ = value

  cpdef int value(self):
    return self.value_

@cython.boundscheck(False)
cpdef int get_sum_1(MyObject[::1] my_vect):
  cdef int sum_ = 0
  cdef Py_ssize_t len = my_vect.shape[0]

  for i in range(len):
    sum_ += (<MyObject>my_vect[i]).value()

  return sum_

cpdef int get_sum_2(MyObject[::1] my_vect):
  cdef int sum_ = 0
  # cdef MyObject obj

  for obj in my_vect:
    sum_ += obj.value()

  return sum_

get_sum_2, above, uses for obj in my_vect style (i.e., iterators). However, because the type of obj is not known in advance and iterators are used (the range of iteration is not known a priori), there is a lot of Python overhead. If you hint Cython regarding the type of obj in get_sum_2 by uncommenting the cdef MyObject obj part, you get around 30-40% speedup (see below for an application code).

You can get two orders of magnitude speedup on top of get_sum_2 by using a plain old for-loop instead of the iterators. Now, because we know how many elements the typed memory view has in advance, we can also turn off bounds checking.

The application code I have tried is as follows:

# app.py
from numpy import array, median
from timeit import repeat

import pyximport
pyximport.install()

from dummy import MyObject
from dummy import get_sum_1
from dummy import get_sum_2


my_vect = array([MyObject(i) for i in range(50000)])

get_1 = repeat("get_sum_1(my_vect)", repeat=100, number=1, globals=globals())
get_2 = repeat("get_sum_2(my_vect)", repeat=100, number=1, globals=globals())

print(f"Median of get_1: {1000*median(get_1)} ms.")
print(f"Median of get_2: {1000*median(get_2)} ms.")

Running python app.py, I receive 50,000 "Constructor called." print statements, followed by the performance measures on my laptop:

Median of get_1: 0.20261999452486634 ms.
Median of get_2: 11.251458498009015 ms.

If you run cython --annotate dummy.pyx, you should see the overheads clearly. In both of the examples, however, I do not see anything but struct MyObject * in the generated C code. This is further supported by the lack of print statements of the constructor in the application code inside the respective function calls.

Arda Aytekin
  • 1,231
  • 14
  • 24
  • My reading of the question is that OP is trying to use Cython's C++ standard library bindings. Python objects are all handled by pointer anyway so don't end up being copied unnecessary – DavidW Aug 23 '19 at 15:32
  • I am sorry, then, @DavidW. But is it not then similar to, if not the same as, [this question](https://stackoverflow.com/q/21720982/4720025). – Arda Aytekin Aug 23 '19 at 15:37
  • Not really. The issue is that Cython requires local variables (but not function parameters) to be default constructable. Therefore you can pass by c++ reference to a Cython function, but a loop variable as a reference doesn't work. – DavidW Aug 23 '19 at 16:20