0

I am trying to define a function, that would create a Python class using C API, that derives from arbitrary Python type base, and has an extra field void* my_ptr in its raw C-like object layout. I want it to reuse Python's __dict__ functionality.

I am not doing it in C, so don't have access to C macros. My initial attempt looks like this (pseudocode):

PyType Derive(PyType base) {
  var newType = new PyType(...);

  newType.tp_flags = HeapType | BaseType; // <- this is important,
    // one should be able to add new attributes and otherwise use __dict__, subclass, etc

  ... filling in other things ...

  int my_ptr_offset = base.tp_basesize; // put my_ptr immediately after base type data
  newType.tp_basesize = my_ptr_offset + sizeof(void*); // instances of new type
    // will have instance size = base size + size of my_ptr

  ...

  return newType;
}

The problem is this code breaks down when base is builtins.object. In that case tp_basesize does not count the field, that would normally store __dict__, and my_ptr_offset ends up pointing to that field, eventually causing it to be overwritten by the consumer of my_ptr.

Any simple Python class that derives from object does not have that problem. E.g.:

class MySimpleClass: pass

On 64-bit machine:

PyType mySimpleClass = ...;
PyType object = ...;
mySimpleClass.tp_basesize // <- 32, includes __dict__
object.tp_basesize // <- 16, does not include space for __dict__

I also noticed a similar problem with builtins.exception.

Right now I just manually check for exception and object and add 2x sizeof(void*) to tp_basesize, which seems to work. But I'd like to understand how to handle that layout properly.

LOST
  • 2,956
  • 3
  • 25
  • 40

1 Answers1

0

I think the information you want is in tp_dictoffset of the base. If this is set to 0 then the base doesn't have a __dict__, anything else and it does.

I'm a little unclear on how you're creating your types, but at-least through a call to PyType_Type (the method used internally when writing class X: in Python) a dict is added unless __slots__ is defined - it sounds like this is both what you want to happen and what is happening. This is detailed under "Inheritance" in the section of documentation I linked.

Therefore, if tp_dictoffset == 0 (and assuming you aren't defining __slots__) then add sizeof(PyObject*) to account for the dictionary that's implicitly added.

DavidW
  • 29,336
  • 6
  • 55
  • 86
  • The thing is, as you can see from the example above, there are actually 2 pointer-sized objects added. Do you know which one is the second one? – LOST Mar 25 '20 at 01:34
  • @LOST: That'd be `__weakref__`. It's unclear where `__dict__` or `__weakref__` would even be coming from with code anything like your pseudocode, though, or why they would be allocated over your field instead of afterward if you adjusted `tp_basicsize` like you show. – user2357112 Mar 25 '20 at 01:55
  • `__dict__` and `__weakref__` don't come from the `Py_TPFLAGS_HEAPTYPE` flag. If they're not provided by the base class already, then `type_new` adds them, unless `__slots__` says not to. You don't seem to be calling `type_new`. – user2357112 Mar 25 '20 at 02:04
  • I suspect (and hope) that `new PyType(...)` is really just a C# wrapping round `type_new`. But this seems like the sort of thing that @LOST should be very sure of – DavidW Mar 25 '20 at 18:52
  • @user2357112supportsMonica @DavidW what is the `type_new` you are referring to? We use `PyType_GenericAlloc` and then fill the struct. – LOST Mar 25 '20 at 18:59
  • It's effectively the C equivalent of `type.__new__` - I'd probably access it by calling `type` through `PyObject_CallObject(PyType_Type, ...)` or similar. – DavidW Mar 25 '20 at 19:32