46

I'm trying to understand how Python's garbage collector detects circular references. When I look at the documentation, all I see is a statement that circular references are detected, except when the objects involved have a __del__ method.

If this happens, my understanding (possibly faulty) is that the gc module acts as a failsafe by (I assume) walking through all the allocated memory and freeing any unreachable blocks.

How does Python detect & free circular memory references before making use of the gc module?

juliomalegria
  • 24,229
  • 14
  • 73
  • 89
user1245262
  • 6,968
  • 8
  • 50
  • 77
  • "..but is _not_ guaranteed to collect garbage containing circular references." Say the [docs](http://docs.python.org/reference/datamodel.html). – Joel Cornett Jun 09 '12 at 16:02
  • Can you link the page of the documentation that you're referring to? – Joel Cornett Jun 09 '12 at 16:05
  • 1
    Here is the page I was reading: http://docs.python.org/extending/extending.html#reference-counts – user1245262 Jun 09 '12 at 16:11
  • Here's a nice (and rather historic) link: http://www.arctrix.com/nas/python/gc/ – Sven Marnach Jun 09 '12 at 16:16
  • @user1245262: It appears that cPython applies a [cycle detection algorithm](http://en.wikipedia.org/wiki/Cycle_detection) to it's list of references. http://docs.python.org/c-api/gcsupport.html, http://docs.python.org/release/2.5.2/ext/refcounts.html... – Joel Cornett Jun 09 '12 at 16:26
  • @SvenMarnach -- I think the link you provided (http://www.arctrix.com/nas/python/gc/) gives the answer I was looking for, under the heading "How does this approach work". Do you want to paste it as an answer? It's too long for a comment & I'd feel silly answering my own question using a link you gave me.... Thanks – user1245262 Jun 09 '12 at 16:40
  • 1
    I cannot find a license on the linked page, so I'm not sure it would be ok to copy a longer passage from it to SO. And without copying the content, [the link wouldn't qualify as an answer](http://meta.stackexchange.com/q/8259). – Sven Marnach Jun 09 '12 at 16:56
  • @SvenMarnach - OK, I guess the question is do you want to summarize that section, or should I? I think that explanation should be placed here in a way that's easy to find (and if I'm wrong & the link is more historical than I think easy to be corrected). Since you found it, I'll leave it up to you as to which of us should do the summary. – user1245262 Jun 09 '12 at 17:01
  • 1
    I won't write a summary, so go ahead. BTW, I found the link [in the Python source code](http://hg.python.org/cpython/file/2059910e7d76/Modules/gcmodule.c). There are further links to relevant mailing list threads. The basic concept of the garbage collector seems to be unchanged. – Sven Marnach Jun 09 '12 at 17:04

3 Answers3

37

How does Python detect & free circular memory references before making use of the gc module?

It doesn't. The gc exists only to detect and free circular references. Non-circular references are handled through refcounting.

Now, to see how gc determines the set of objects referenced by any given object, take a look at the gc_get_references function in Modules/gcmodule.c. The relevant bit is:

// Where `obj` is the object who's references we want to find
traverseproc traverse;
if (! PyObject_IS_GC(obj))
    continue;
traverse = Py_TYPE(obj)->tp_traverse;
if (! traverse)
    continue;
if (traverse(obj, (visitproc)referentsvisit, result)) {
    Py_DECREF(result);
    return NULL;
}

The major function here is tp_traverse. Each C-level type defines a tp_traverse function (or in the case of objects which don't hold any references, like str, sets it to NULL). One example of tp_traverse is list_traverse, the traversal function for list:

static int
list_traverse(PyListObject *o, visitproc visit, void *arg)
{
    Py_ssize_t i;

    for (i = Py_SIZE(o); --i >= 0; )
        Py_VISIT(o->ob_item[i]);
    return 0;
}

I see is a statement that circular references are detected, except when the objects involved have a __del__() method.

You are correct — Python's cycle detector can detect and collect cycles unless they contain objects with a __del__ method, as there is no way for the interpreter to safely delete these objects (to get an intuition on why this is, imagine you've got two objects with __del__ methods that reference each other. In which order should they be freed?).

When objects with a __del__ method are involved in a cycle, the garbage collector will stick them in a separate list (accessible through gc.garbage) so that the programmer can manually "deal with" them.

Community
  • 1
  • 1
David Wolever
  • 148,955
  • 89
  • 346
  • 502
  • 1
    In the docs, I saw the following statement: "The cycle detector is able to detect garbage cycles and can reclaim them so long as there are no finalizers implemented in Python (__del__() methods). When there are such finalizers, the detector exposes the cycles through the gc module (specifically, the garbage variable in that module). " http://docs.python.org/extending/extending.html#reference-counts .... I interpreted this as meaning that gc was a failsafe/slower method. Was I misinterpreting the docs (I easily could have been)? – user1245262 Jun 09 '12 at 16:17
  • 2
    @user1245262, the `__del__` issue isn't really related to finding the garbage. Python does find that such objects are garbage and stick them in the `gc.garbage` list The only reason such objects aren't deleted is that python cannot tell what is the safe order to delete them. – Winston Ewert Jun 09 '12 at 16:24
  • 1
    Ah, sorry – I forgot to address that question. I believe that means: "when there are no objects with `__del__` methods, the cycle detector can reclaim everything. However, since the cycle detector can't safely collect objects with `__del__` methods do exist, cycles involving these objects are exposed through the `gc` module, allowing the programmer to manually clean them up". – David Wolever Jun 09 '12 at 16:24
  • 4
    As of Python 3.4, cycles with `__del__` methods are collected. (PEP 442) – Antimony Jul 16 '16 at 22:09
8

How does Python detect & free circular memory references before making use of the gc module?

Python's garbage collector (not actually the gc module, which is just the Python interface to the garbage collector) does this. So, Python doesn't detect and free circular memory references before making use of the garbage collector.

Python ordinarily frees most objects as soon as their reference count reaches zero. (I say "most" because it never frees, for example, small integers or interned strings.) In the case of circular references, this never happens, so the garbage collector periodically walks memory and frees circularly-referenced objects.

This is all CPython-specific, of course. Other Python implementations have different memory management (Jython = Java VM, IronPython = Microsoft .NET CLR).

kindall
  • 178,883
  • 35
  • 278
  • 309
  • 1
    Not entirely accurate. Python doesn't walk memory and free unreachable objects. It walks memory and detects references cycles, and frees those. – Winston Ewert Jun 09 '12 at 16:17
  • 1
    Does it “walk memory”? I thought it walked references lists? (or does “walk memory” have a technical definition that I'm unfamiliar with?) – David Wolever Jun 09 '12 at 16:34
  • 1
    The `gc` module *is* the cyclic garbage collector. It's not just the Python interface, it's the implementation. – Sven Marnach Jun 09 '12 at 16:41
  • The `gc` module isn't in sys.modules by default, whereas some other modules whose implementations are built into Python are, so I assumed the `gc` module was just the implementation. Thanks for the update. – kindall Jun 09 '12 at 17:05
7

I think I found the answer I'm looking for in some links provided by @SvenMarnich in comments to the original question:

Container objects are Python objects that can hold references to other Python objects. Lists, Classes, Tuples etc are container objects; Integers, Strings etc. are not. So, only container objects are at risk for being in a circular reference.

Each Python object has a field - *gc_ref*, which is (I believe) set to NULL for non-container objects. For container objects it is set equal to the number of non container objects that reference it

Any container object with a *gc_ref* count greater than 1 (? I would've thought 0, but OK for now ?) has references that are not container objects. So they are reachable and are removed from consideration of being unreachable memory islands.

Any container object reachable by an object known to be reachable (i.e. those we just recognized as having a *gc_ref* count greater than 1) also does not need to be freed.

The remaining container objects are not reachable (except by each other) and should be freed.

http://www.arctrix.com/nas/python/gc/ is a link providing a fuller explanation http://hg.python.org/cpython/file/2059910e7d76/Modules/gcmodule.c is a link to the source code, which has comments further explaining the thoughts behind the circular reference detection

user1245262
  • 6,968
  • 8
  • 50
  • 77
  • I guess my preference for this answer is idiosyncratic, but the links @SvenMarnich provided gave me an answer in terms I could readily understand and explain to someone else. The explanation given by David Wolever was also quite good and if I want to modify/modify or garbage collect objects I create & implement in C, it will be very useful. – user1245262 Jun 11 '12 at 16:23