What is the difference between different pybind11's type conversion options?

Question

I have a project where i am mixing cpp and python code.

For multiple reasons, the frontend needs to be in python and the backend in cpp.

now, i am looking for a solution as for how to pass my python object to cpp. one think to note is the fact that the backend needs to callback into python at some point for calculating some numbers where the python function will return a list of floats.

I have been looking at pybind type conversion options defined here: https://pybind11.readthedocs.io/en/stable/advanced/cast/index.html

However, to me it seems like options 1 is kind of easy to use as i can see here: https://pybind11.readthedocs.io/en/stable/advanced/classes.html#overriding-virtual-functions-in-python

so i am wondering, why would someone choose number 3? how does it compare with option 1?

Many thanks

The following link may nicely complement your reading of the docs: https://github.com/pybind/pybind11/issues/1201 — Pranav Vempati, Feb 02 '20 at 00:30

score 1 · Accepted Answer · answered Feb 02 '20 at 03:23

Yes, if the main code is in C++ and the bindings are well fleshed out, then option 1 is the easiest to work with, as in that case the bound C++ objects are as natural to use in Python as native Python classes. It makes life easier because you get full control over object identity and whether or not to copy.

For 3, I'm finding pybind11 to be too aggressive with copying when using callbacks (as seems to be your use case), e.g. with numpy arrays it's perfectly possible to work with the buffer on the C++ side if it is verified to be contiguous. Sure, copying will safeguard against memory problems, but there's too little control given over copying v.s. non-copying (numpy has the same problem tbs).

The reason why 3 exists is mostly because it improves usability and provides nice syntax. For example, if we have a function with this signature:

void func(const std::vector<int>&)

then it is nice to be able to call it from the Python side as func((1, 2, 3)) or even func(range(3)). It's convenient, easy to use, looks clean, etc. But at that point, there is no way out but to copy, since the memory layout of a tuple is so different from a std::vector (and the range does not even represent an in-memory container).

Note carefully however, that with the func example above, the caller could still decide to provide a bound std::vector<int> object, and thus pre-empt any copying. May not look as nice, but there is full control. This is useful, for example if the vector is a return value from some other function, or is modified in between calls:

v = some_calc()   # with v a bound C++ vector
func(v)
v.append(4)       # add an element
func(v)

Contrast this to the case where a list of floats is returned after calculating some numbers, analog to (but not quite) your description:

std::list<float> calc()

If you choose "option 1", then the bound function calc will return a bound C++ object of std::list<float>. If you choose "option 3", then the bound function calc will return a Python list with the contents of the C++ std::list<float> copied into it.

The problem that arises with "option 3" is that if the caller actually wanted a bound C++ object, then the values need to be copied back into a new list, so a total of 2 copies. OTOH, if you choose "option 1" and the caller wanted instead a Python list, then they are free to do the copy on the return value of calc if desired:

res = calc()
list_res = list(res)

or even, if they want this all the time:

def pycalc():
    return list(calc())

Now finally to your specific case where it is a Python callback, called from C++, that returns a list of floats. If you use "option 1", then the Python function is forced to create a C++ list to return, so for example (with type cpplist the name given to a bound type std::list<float>):

def pycalc():
    return cpplist(range(3))

which a Python programmer would not find pretty. Instead, by choosing "option 3", checking the return type and doing a conversion if needed, this would be valid as well:

def pycalc():
    return [x for x in range(3)]

Depending on the overall requirements and typical use cases then, "option 3" may be more appreciated by your users.

Thanks for all the information. To understand this a bit better, is it correct to say that when using "option 1", pybind is generating the python types for me and those types are laid out differently in memory compared to their C++ versions? Considering the former, doesn't "option 1" also copy data when passing types to C++?(implicitly). Lastly, should I expect to see any difference in performance between "option 1" and "option 3" while executing code on the C++ side? — katetsu, Feb 02 '20 at 14:57
Option 1 generates Python proxies that wrap the C++ objects in place, ie. they hold pointers to the underlying C++ objects and forward everything (calls and data access), so no copies, not even implicitly. For performance, it matters strongly what you do. E.g. iterating over a bound `std::vector` by proxy is slower (it shouldn't be, but pybind11 has terrible performance) than iterating over a tuple, but copying 1M elements just to access a few select elements is bad the other way around. So, that again comes down to your specific use cases. — Wim Lavrijsen, Feb 02 '20 at 18:48

What is the difference between different pybind11's type conversion options?

1 Answers1