2

I'm writing a C extension for Python and going through the documentation, I'm having a hard time understanding member assignment in the __init__ function.

So there, in section 2.2, member assignment is done as follow:

if (first) {
    tmp = self->first;
    Py_INCREF(first);
    self->first = first;
    Py_XDECREF(tmp);
}

The explanation says later:

Our type doesn’t restrict the type of the first member, so it could be any kind of object. It could have a destructor that causes code to be executed that tries to access the first member; or that destructor could release the Global interpreter Lock and let arbitrary code run in other threads that accesses and modifies our object.

If I Py_XDECREF it, self->first will become invalid.

I understand the case where, if the destructor releases the Global interpreter Lock, there is a dangling pointer and my object (self) could be modified, etc... Everything goes out of control.

But why is it an issue the destructor accesses it ? Why this part:

It could have a destructor that causes code to be executed that tries to access the first member;

is an issue ? If the destructor of self->first accesses itself, fine. I don't care, it's its problem.

Hope I'm clear enough. Thanks for your replies.

StinGer
  • 76
  • 1
  • 5
  • not quite sure where the confusion is, but maybe here: the object referred to by `first` could have a reference back to the object being initialised, this other object's destructor could itself change `first` (it's exposed via a `PyMemberDef`) and cause an issue, if the code wasn't being as careful. can write up a better description as an answer if that's about right... – Sam Mason Sep 03 '19 at 11:12
  • Did not think about the back reference problem. Thanks for pointing it out @SamMason However, in such a situation, why on hell would the destructor of `first` do something with the back reference (the `self` of `self->first`) ? – StinGer Sep 03 '19 at 13:17

1 Answers1

2

Whenever you allow arbitrary Python code to run you need to make sure your object is in a valid state.

When you DECREF an object it's possible that arbitrary Python code is run, for example if the object being deleted has a __del__ method (or if it's a C extension method a tp_dealloc). That code could do anything, for example (like mentioned in the quoted text) access the first attribute of the instance.

For example:

c = Custom("Firstname", "Lastname", 10)

class T:
    def __del__(self):
        print("T", c.first)  # access the first attribute of the "c" variable

c.first = T()
c.__init__(T())
c.__init__(T())
c.__init__(T())

Now if your C code would look like this:

Py_XDECREF(self->first);
Py_INCREF(first);
self->first = first;

At the time the T.__del__ would run (triggered by Py_XDECREF) it would access the c.first which at that moment is an invalid object (because it has a reference count of 0).

In this example it accidentally doesn't break (on my computer) because the memory isn't re-used yet. However if it's slightly more complex it often kills the Python process (on my computer):

c = Custom("Firstname", "Lastname", 10)

class T:
    def __del__(self):
        print("T", c.first)

class U:
    def __init__(self, t):
        self.t = t

    def __del__(self):
        print("U")

c.first = U(T())
c.__init__(U(T()))  # repeated multiple times to make the crash more likely
c.__init__(U(T()))
c.__init__(U(T()))

All that can be avoided (and should be avoided) by making sure the object is in a valid state before calling DECREF (not just during __init__ but everywhere!), either by setting it to null:

Py_CLEAR(self->first);  // Sets the field to null before XDECREF is used
Py_INCREF(first);
self->first = first;

Or by replacing it immediately with first:

tmp = self->first;
Py_INCREF(first);
self->first = first;
Py_XDECREF(tmp);

Or if you don't want to repeatedly use this kind of idiom, you can also create a function like this:

static void replace_field(PyObject **field, PyObject *val)
{
    PyObject *tmp = *field;
    *field = val;
    Py_XDECREF(tmp);
}

Which simplifies the code to:

Py_INCREF(first);
replace_field(&self->first, first);
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • As I always pay attention to what I'm doing in `__del__`, I did not think about such (evil) possibilities. Thanks for the `Py_CLEAR` macro, did not see it at first in the doc. ;) – StinGer Sep 05 '19 at 01:35