2

Class creation seems to never re-define the __dict__ and __weakref__ class attributes (i.e. if they already exist in the dictionary of a superclass, they are not added to the dictionaries of its subclasses), but to always re-define the __doc__ and __module__ class attributes. Why?

>>> class A: pass
... 
>>> class B(A): pass
... 
>>> class C(B): __slots__ = ()
... 
>>> vars(A)
mappingproxy({'__module__': '__main__',
              '__dict__': <attribute '__dict__' of 'A' objects>,
              '__weakref__': <attribute '__weakref__' of 'A' objects>,
              '__doc__': None})
>>> vars(B)
mappingproxy({'__module__': '__main__', '__doc__': None})
>>> vars(C)
mappingproxy({'__module__': '__main__', '__slots__': (), '__doc__': None})
>>> class A: __slots__ = ()
... 
>>> class B(A): pass
... 
>>> class C(B): pass
... 
>>> vars(A)
mappingproxy({'__module__': '__main__', '__slots__': (), '__doc__': None})
>>> vars(B)
mappingproxy({'__module__': '__main__',
              '__dict__': <attribute '__dict__' of 'B' objects>,
              '__weakref__': <attribute '__weakref__' of 'B' objects>,
              '__doc__': None})
>>> vars(C)
mappingproxy({'__module__': '__main__', '__doc__': None})
Géry Ogam
  • 6,336
  • 4
  • 38
  • 67

1 Answers1

3

The '__dict__' and '__weakref__' entries in a class's __dict__ (when present) are descriptors used for retrieving an instance's dict pointer and weakref pointer from the instance memory layout. They're not the actual class's __dict__ and __weakref__ attributes - those are managed by descriptors on the metaclass.

There's no point adding those descriptors if a class's ancestors already provide one. However, a class does need its own __module__ and __doc__, regardless of whether its parents already have one - it doesn't make sense for a class to inherit its parent's module name or docstring.


You can see the implementation in type_new, the (very long) C implementation of type.__new__. Look for the add_weak and add_dict variables - those are the variables that determine whether type.__new__ should add space for __dict__ and __weakref__ in the class's instance memory layout. If type.__new__ decides it should add space for one of those attributes to the instance memory layout, it also adds getset descriptors to the class (through tp_getset) to retrieve the attributes:

if (add_dict) {
    if (base->tp_itemsize)
        type->tp_dictoffset = -(long)sizeof(PyObject *);
    else
        type->tp_dictoffset = slotoffset;
    slotoffset += sizeof(PyObject *);
}
if (add_weak) {
    assert(!base->tp_itemsize);
    type->tp_weaklistoffset = slotoffset;
    slotoffset += sizeof(PyObject *);
}
type->tp_basicsize = slotoffset;
type->tp_itemsize = base->tp_itemsize;
type->tp_members = PyHeapType_GET_MEMBERS(et);

if (type->tp_weaklistoffset && type->tp_dictoffset)
    type->tp_getset = subtype_getsets_full;
else if (type->tp_weaklistoffset && !type->tp_dictoffset)
    type->tp_getset = subtype_getsets_weakref_only;
else if (!type->tp_weaklistoffset && type->tp_dictoffset)
    type->tp_getset = subtype_getsets_dict_only;
else
    type->tp_getset = NULL;

If add_dict or add_weak are false, no space is reserved and no descriptor is added. One condition for add_dict or add_weak to be false is if one of the parents already reserved space:

add_dict = 0;
add_weak = 0;
may_add_dict = base->tp_dictoffset == 0;
may_add_weak = base->tp_weaklistoffset == 0 && base->tp_itemsize == 0;

This check doesn't actually care about any ancestor descriptors, just whether an ancestor reserved space for an instance dict pointer or weakref pointer, so if a C ancestor reserved space without providing a descriptor, the child won't reserve space or provide a descriptor. For example, set has a nonzero tp_weaklistoffset, but no __weakref__ descriptor, so descendants of set won't provide a __weakref__ descriptor either, even though instances of set (including subclass instances) support weak references.

You'll also see an && base->tp_itemsize == 0 in the initialization for may_add_weak - you can't add weakref support to a subclass of a class with variable-length instances.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • "This check doesn't actually care about any ancestor descriptors, just whether an ancestor reserved space for an instance dict pointer or weakref pointer" Interesting. Indeed, for `'__weakref__'`: `class A(int): pass`, `assert vars(A).get('__weakref__') is None`. However for `'__dict__'`: `assert vars(A).get('__dict__') is not None`. – Géry Ogam Apr 02 '21 at 11:58
  • "There's no point adding those descriptors if a class's ancestors already provide one." Is it because the *descriptor* nature of `__dict__` and `__weakref__` make them already specific to each subclass and their instances through the `instance` and `owner` parameter of `__get__`? In other words, if the `__doc__` and `__module__` were also descriptors they would not need to be redefined for each subclass? – Géry Ogam Apr 02 '21 at 12:16
  • 1
    @Maggyero: What you're seeing with `int` is a different thing - see the `&& base->tp_itemsize == 0` in the `may_add_weak` line? `int` has variable-size instances, and not through a separate data pointer in a fixed-size instance like with `list` - an `int` instance is a single variable-length chunk of memory. You can't add weakref support to a subclass of a variable-length type. There's some special handling to support adding `__dict__`, but not `__weakref__`. – user2357112 Apr 02 '21 at 12:17
  • "Is it because the descriptor nature of..." - no. It's because those descriptors have *nothing to do with* the class's `__dict__` and `__weakref__` attributes. – user2357112 Apr 02 '21 at 12:19
  • So according to the checks you provided, a child will add a `'__dict__'` descriptor if an ancestor does not reserve space for a `'__dict__'` descriptor, and add a `'__weakref__'` descriptor if an ancestor does not reserve space for a `'__weakref__'` descriptor and does not have variable-length instances. – Géry Ogam Apr 02 '21 at 14:28
  • I have tested on built-in types: `for T in [int, float, complex, list, tuple, str, bytes, bytearray, set, frozenset, dict]: A = type('A', (T,), {}); print(T.__name__, '__dict__' in vars(A), '__weakref__' in vars(A))` prints: int True False, float True True, complex True True, list True True, tuple True False, str True True, bytes True False, bytearray True True, set True False, frozenset True False, dict True True. – Géry Ogam Apr 02 '21 at 14:29
  • So this means that all built-in types do not reserve space for a `'__dict__'` descriptor, that `float`, `complex`, `list`, `str`, `bytearray`, and `dict` do not reserve space for a `'__weakref__'` descriptor and do not have variable-length instances, and that `int`, `tuple`, `bytes`, `set`, and `frozenset` reserve space for a `'__weakref__'` descriptor or have variable-length instances. Thus it seems to support your analysis of the source code of CPython. – Géry Ogam Apr 02 '21 at 14:37
  • One thing I still do not get is the “There's no point adding those descriptors if a class's ancestors already provide one.” part. My understanding is that we don’t need descriptors in subclasses because attribute retrieval from a class instance like `instance.__weakref__` will look up the descriptor in `type(instance)` and its ancestors, and return `vars(class)['__weakref__'].__get__(instance)`. So thanks to the `instance` argument the descriptor will be able to provide an `instance` or `type(instance)` specific attribute, irrespective of the position of the descriptor in the class hierarchy. – Géry Ogam Apr 02 '21 at 15:40
  • However if `__weakref__` (and `__dict__`) was not a class descriptor but a simple class attribute, `instance.__weakref__` would return `vars(class)['__weakref__']` and therefore have no way to be `instance` or `type(instance)` specific, so it would have to be redefined for each subclass to be `type(instance)` specific, like `__doc__` and `__module__`. But you told me that this understanding is not correct, so I am confused. – Géry Ogam Apr 02 '21 at 15:47
  • @Maggyero: The `__weakref__` descriptor in a class dict is used for accessing the `__weakref__` attribute of the class's instances - it reads the memory area in the instance reserved for the weakref pointer. This works for any instance, including subclass instances, so subclasses don't need their own `__weakref__` descriptor. This descriptor has nothing to do with the class's own `__weakref__` attribute - that attribute is accessed through a metaclass descriptor. – user2357112 Apr 02 '21 at 15:50
  • In contrast, `__module__` and `__doc__` are ordinary class attributes. The `__module__` entry in a class dict *is* used for the class's `__module__` attribute. There is no "instance `__module__`". – user2357112 Apr 02 '21 at 15:53
  • 1
    Classes *do* have their own dict pointer and weakref pointer, but these don't show up in `__dict__` - they're embedded directly in the class's memory layout. – user2357112 Apr 02 '21 at 15:56
  • Yes I understood this. From the start my question assumed I was talking about the `'__weakref__'` descriptor in the class dictionary, not the `'__weakref__'` descriptor in the metaclass dictionary (which by the way does not exist for `type` : `vars(type)['__weakref__']` raises `KeyError`. Only the `'__dict__'` descriptor in the metaclass dictionary exist. But that is another topic. So under this assumption, is my understanding correct? – Géry Ogam Apr 02 '21 at 15:56
  • 2
    oh, right, there's no `__weakref__` descriptor in `type` - it has a nonzero weaklistoffset, but no descriptor to provide access through an attribute, like the `set` example. – user2357112 Apr 02 '21 at 15:57
  • If the `__dict__` and `__weakref__` entries in a class dict weren't descriptors, they wouldn't work at all, so it doesn't make much sense to talk about whether they would have to be redefined in subclasses - giving subclasses their own non-descriptor dict entries for `__dict__` and `__weakref__` wouldn't help. – user2357112 Apr 02 '21 at 16:00
  • Sorry I wanted to ask the question differently: if `'__doc__'` and `'__module__'` were descriptors in class dictionaries (like `'__dict__'` and `'__weakref__'`), would it be still necessary to redefine them in each subclass dictionary, or we could just define them once in an ancestor dictionary (like `'__dict__'` and `'__weakref__'`)? – Géry Ogam Apr 02 '21 at 16:10
  • 1
    There actually are descriptors for `__doc__` and `__module__`, but those descriptors live in `type`, and they exist because C classes don't store their docstring and module name in their dict. It wouldn't make sense for ordinary classes to have `__doc__` and `__module__` descriptors - even if you moved the docstring and module name into the memory slots C classes use, it'd make more sense to have the descriptors in `type` be responsible for providing access. – user2357112 Apr 02 '21 at 16:11
  • 1
    You could try to put the descriptors in `object` and rely on the `owner` argument `__get__` takes, but `__set__` and `__delete__` wouldn't work. – user2357112 Apr 02 '21 at 16:14
  • Thanks, I did not know that the dictionary of `type` had a `'__doc__'` and `'__module__'` descriptors. Okay so the answer is *yes*, we could put the `'__doc__'` and `'__module__'` descriptors in `object`. So I think this confirms my initial conclusion: it is the very nature of descriptors which does not require putting a `'__dict__'` and `'__weakref__'` entry in each subclass dictionary to make them subclass specific like `'__doc__'` and `'__module__'`. Do you confirm? – Géry Ogam Apr 02 '21 at 16:25
  • 1
    I wouldn't say it's the nature of descriptors. It just happens that subclasses don't need to override those particular things. There are plenty of descriptors subclasses *do* override - for example, every overridden method - and plenty of non-descriptors subclasses don't override. – user2357112 Apr 02 '21 at 16:50
  • I see your point: it is not only the nature of descriptors, but also the nature of these particular attributes (`__dict__` and `__weakref__`) which do not require overriding in subclasses, as all the necessary information to provide instance-specific or subclass-specific attributes is in the `instance` or `owner` argument passed to the `__get__` method; they do not need extra user information like bound methods. Thanks for this in-depth discussion, I am fully satisfied and even more. Answer accepted! – Géry Ogam Apr 02 '21 at 17:38