0

I have a c++ piece a code, included in a larger native python project, that triggers various random read access violation. I suspect there is an issue with the handling of the reference count but I cannot figure it. The code features a C++ class with 2 attributes wrapped into a Python Object.

    typedef struct
{
    PyObject_HEAD
    MyCustomClass *self;
} PyMyCustomClass;

class MyCustomClass {
public:
    PyObject *values;
    PyObject *incr_values;
    ...
}

Both of the attributes are tuple initialized to None and MyCustomClass features the following methods:

MyCustomClass(){
    values = Py_BuildValue("");
    incr_values= Py_BuildValue("");
}

~MyCustomClass(){
    Py_DECREF(this->values);
    Py_DECREF(this->incr_values);
}

PyObject *get_values() {
    Py_INCREF(this->values);
    return this->values;
}

int set_incr_values( PyObject *new_values) {
    Py_DECREF(this->incr_values);
    Py_INCREF(new_values);
    this->incr_values = new_values;
    return 0;
}

PyObject *compute_incr_values() {
    if( condition )
        return this->get_values(); //new reference
    else { //add 1 to all values
        PyObject *one = Py_BuildValue("i", 1);
        Py_ssize_clean_t size = PyTuple_GET_SIZE(this->values);
        PyObject *new_values = PyTuple_New(size);
        for(Py_ssize_t i = 0; i < size; i++ ) {
            PyObject *item = PyTuple_GET_ITEM(input,i);
            auto add_fct = Py_TYPE(item)->tp_as_number->nb_add;
            PyTuple_SET_ITEM(new_values, i, add_fct(item,one) );
        }
        Py_DECREF(one);
        return new_values; //new reference
    }
}

static PyObject *compute_incr_values(PyMyCustomClass *obj, PyObject *Py_UNUSED) {
    PyObject *new_values = obj->self->compute_incr_values();
    obj->self->set_incr_values(new_values);
    Py_DECREF(new_values); //Get rid of unused object
    Py_RETURN_NONE;
}

The code as presented causes various random read access violation to be triggered in the Python code. However if I remove Py_DECREF(this->values); in the destructor and remove Py_DECREF(new_values); in compute_incr_values method, it then works.

I do not understand the issue here. Is there an issue with the handling of the reference count ?

m3shoot
  • 23
  • 3
  • There's absolutely no error checking after any of the C API calls. The problem could be anywhere – DavidW Oct 07 '22 at 12:04
  • `set_incr_values` definitely has the potential to mess up though if `new_values` is the same as `this->incr_values` – DavidW Oct 07 '22 at 12:08
  • I have removed the error checking for simplicity. And the thing is that this code never breaks. The errors occur in other part of the native python code that seems unrelated. – m3shoot Oct 07 '22 at 13:36

1 Answers1

0

I can see at least two issues with your code.

Your set_incr_values function is broken

int set_incr_values( PyObject *new_values) {
    Py_DECREF(this->incr_values);
    Py_INCREF(new_values);
    this->incr_values = new_values;
    return 0;
}

There's actually two issues here. First it can fail if new_values is the same as this->incr_values. Second, Py_DECREF can cause arbitrary code to be executed (read the big red warning on the documentation for Py_DECREF). Therefore, you must ensure that self is in a valid state before the decref. Doing the assignment first is the easiest way of doing this.

The better way to do it would be:

int set_incr_values( PyObject *new_values) {
    PyObject *old = this->incr_values;
    this->incr_values = new_values;
    Py_INCREF(new_values);
    Py_DECREF(this->incr_values);
    return 0;
}

values is not a tuple

You uncritically use values as a tuple inside compute_incr_values (e.g. PyTuple_GET_SIZE). However, when you create values you assign None to it (which you know because you point it out in the question, although I don't think "tuple initialized" means anything).

    values = Py_BuildValue("");

There's also no error checking. You say in the comments "I have removed the error checking for simplicity". This is generally unhelpful - my first assumption when reading C API code with no error checking is that it's failing just because they're ignoring some exception. That assumption is usually right.

It isn't possible to tell exactly what's wrong with the code because there's no minimal reproducible example. However, there's plenty of issues based on a quick skim-read so I'd be suspicious of the rest of it.

DavidW
  • 29,336
  • 6
  • 55
  • 86