13

I've just run into a situation where pseudo-private class member names aren't getting mangled when using setattr or exec.

In [1]: class T:
   ...:     def __init__(self, **kwargs):
   ...:         self.__x = 1
   ...:         for k, v in kwargs.items():
   ...:             setattr(self, "__%s" % k, v)
   ...:         
In [2]: T(y=2).__dict__
Out[2]: {'_T__x': 1, '__y': 2}

I've tried exec("self.__%s = %s" % (k, v)) as well with the same result:

In [1]: class T:
   ...:     def __init__(self, **kwargs):
   ...:         self.__x = 1
   ...:         for k, v in kwargs.items():
   ...:             exec("self.__%s = %s" % (k, v))
   ...:         
In [2]: T(z=3).__dict__
Out[2]: {'_T__x': 1, '__z': 3}

Doing self.__dict__["_%s__%s" % (self.__class__.__name__, k)] = v would work, but __dict__ is a readonly attribute.

Is there another way that I can dynamically create these psuedo-private class members (without hard-coding in the name mangling)?


A better way to phrase my question:

What does python do “under the hood” when it encounters a double underscore (self.__x) attribute being set? Is there a magic function that is used to do the mangling?

chown
  • 51,908
  • 16
  • 134
  • 170
  • This is somewhat of an unusual case, because you are allowing the constructor to assign arbitrary private variables. If the constructor can give these variables any value, why even make them private? Can you use named keyword arguments with default values to assign these, instead? – Michael Aaron Safyan Oct 16 '11 at 03:57
  • @MichaelAaronSafyan Thats what I was doing originally, but then I wanted to expand T to take any kwarg, and thought about a subclass (call it S) that passes **kwargs through to its super during __init__ but not allowing S access to any of those members (since S can see kwargs before calling T.__init__(self, **kwargs) S could potentially break stuff). – chown Oct 16 '11 at 04:13
  • At this point, I'm still writing the code, so I want to see if this is even feasible, and if not, I probably will go back to using something like `def __init__(self, x=1, y=2):` instead. – chown Oct 16 '11 at 04:19
  • From the docs: "Notice that code passed to exec, eval() or execfile() does not consider the classname of the invoking class to be the current class; this is similar to the effect of the global statement, the effect of which is likewise restricted to code that is byte-compiled together. The same restriction applies to getattr(), setattr() and delattr(), as well as when referencing __dict__ directly." Which specifies why `exec` and `setattr` don't work... though I don't know the solution. – tom10 Oct 16 '11 at 04:42
  • 1
    The mangled name is hard coded in the function's code object. I tried using `compile`, but it retained the unmangled name. – Eryk Sun Oct 16 '11 at 04:47
  • @tom10 Really? Theres a doc that discusses this same issue? Do you have a link to the page? – chown Oct 16 '11 at 04:50
  • Name mangling in Python isn't something you want to use too much. Just stick the attributes in a separate dict and maybe mangle the dict only. – yak Oct 16 '11 at 10:51
  • @chown: Here's a link to the docs I was quoting (I quoted the last paragraph): http://docs.python.org/tutorial/classes.html#private-variables – tom10 Oct 16 '11 at 15:54
  • @tom10 Cool, thanks! I guess now my question naturally becomes: "What does python do “under the hood” when it encounters a double underscore (self.__x) attribute being set? Is there a magic function that is used to do the mangling?" – chown Oct 16 '11 at 16:10

3 Answers3

11

I believe Python does private attribute mangling during compilation... in particular, it occurs at the stage where it has just parsed the source into an abstract syntax tree, and is rendering it to byte code. This is the only time during execution that the VM knows the name of the class within whose (lexical) scope the function is defined. It then mangles psuedo-private attributes and variables, and leaves everything else unchanged. This has a couple of implications...

  • String constants in particular are not mangled, which is why your setattr(self, "__X", x) is being left alone.

  • Since mangling relies on the lexical scope of the function within the source, functions defined outside of the class and then "inserted" do not have any mangling done, since the information about the class they "belong to" was not known at compile-time.

  • As far as I know, there isn't an easy way to determine (at runtime) what class a function was defined in... At least not without a lot of inspect calls that rely on source reflection to compare line numbers between the function and class sources. Even that approach isn't 100% reliable, there are border cases that can cause erroneous results.

  • The process is actually rather indelicate about the mangling - if you try to access the __X attribute on an object that isn't an instance of the class the function is lexically defined within, it'll still mangle it for that class... letting you store private class attrs in instances of other objects! (I'd almost argue this last point is a feature, not a bug)

So the variable mangling is going to have to be done manually, so that you calculate what the mangled attr should be in order to call setattr.


Regarding the mangling itself, it's done by the _Py_Mangle function, which uses the following logic:

  • __X gets an underscore and the class name prepended. E.g. if it's Test, the mangled attr is _Test__X.
  • The only exception is if the class name begins with any underscores, these are stripped off. E.g. if the class is __Test, the mangled attr is still _Test__X.
  • Trailing underscores in a class name are not stripped.

To wrap this all up in a function...

def mangle_attr(source, attr):
    # return public attrs unchanged
    if not attr.startswith("__") or attr.endswith("__") or '.' in attr:
        return attr
    # if source is an object, get the class
    if not hasattr(source, "__bases__"):
        source = source.__class__
    # mangle attr
    return "_%s%s" % (source.__name__.lstrip("_"), attr)

I know this somewhat "hardcodes" the name mangling, but it is at least isolated to a single function. It can then be used to mangle strings for setattr:

# you should then be able to use this w/in the code...
setattr(self, mangle_attr(self, "__X"), value)

# note that would set the private attr for type(self),
# if you wanted to set the private attr of a specific class,
# you'd have to choose it explicitly...
setattr(self, mangle_attr(somecls, "__X"), value)

Alternately, the following mangle_attr implementation uses an eval so that it always uses Python's current mangling logic (though I don't think the logic laid out above has ever changed)...

_mangle_template = """
class {cls}:
    @staticmethod
    def mangle():
        {attr} = 1
cls = {cls}
"""

def mangle_attr(source, attr):
    # if source is an object, get the class
    if not hasattr(source, "__bases__"):
        source = source.__class__
    # mangle attr
    tmp = {}
    code = _mangle_template.format(cls=source.__name__, attr=attr)
    eval(compile(code, '', 'exec'), {}, tmp); 
    return tmp['cls'].mangle.__code__.co_varnames[0]

# NOTE: the '__code__' attr above needs to be 'func_code' for python 2.5 and older
Eli Collins
  • 8,375
  • 2
  • 34
  • 38
  • Wow, you found the actual C function that does the name mangling (`_Py_Mangle`)! Thats exactly what I was looking for. Thanks Eli!! – chown Oct 17 '11 at 18:48
  • I'm still not sure if I will still try to even make these attributes private or not. I'll have to fill out more of the code and see what will work the best, but if I do, I am going to try both this code and the code from @eryksun answer and see what will work best. This code looks a bit more portable as its a function, but thanks to everyone for the good answers! It helped a ton. – chown Oct 17 '11 at 18:51
  • The answer still holds as of today for Python 3.10. – Erdem Tuna May 24 '22 at 19:17
4

Addressing this:

What does python do “under the hood” when it encounters a double underscore (self.__x) attribute being set? Is there a magic function that is used to do the mangling?

AFAIK, it's basically special cased in the compiler. So once it's in bytecode, the name is already mangled; the interpreter never sees the unmangled name at all, and had no idea of any special handling needed. This is why references through setattr, exec, or by looking up a string in __dict__ don't work; the compiler sees all of those as strings, and doesn't know that they have anything to do with attribute access, so it passes them through unchanged. The interpreter knows nothing of the name mangling, so it just uses them directly.

The times I've needed to get around this, I've just manually done the same name mangling, hacky as that is. I've found that using these 'private' names is generally a bad idea, unless it's a case where you know you need them for their intended purpose: to allow an inheritance hierarchy of classes to all use the same attribute name but have a copy per class. Peppering attribute names with double underscores just because they're supposed to be private implementation details seems to cause more harm than benefit; I've taken to just using a single underscore as a hint that external code shouldn't be touching it.

Ben
  • 68,572
  • 20
  • 126
  • 174
  • This is really good stuff Ben! I was surprised to learn that it isn't the interpreter doing the name mangling but the compiler. Thanks for the great info. – chown Oct 17 '11 at 18:48
2

Here's the hack I have so far. Suggestions for improvement are welcome.

class T(object):

    def __init__(self, **kwds):
        for k, v in kwds.items():
            d = {}
            cls_name = self.__class__.__name__

            eval(compile(
                'class dummy: pass\n'
                'class {0}: __{1} = 0'.format(cls_name, k), '', 'exec'), d)

            d1, d2 = d['dummy'].__dict__, d[cls_name].__dict__
            k = next(k for k in d2 if k not in d1)

            setattr(self, k, v)

>>> t = T(x=1, y=2, z=3)
>>> t._T__x, t._T__y, t._T__z
(1, 2, 3)
Eryk Sun
  • 33,190
  • 5
  • 92
  • 111
  • This is interesting way to do it! At first glance though it seems like this might overcomplicate things for future readers of my script. I'll tinker around with it though, thanks! – chown Oct 16 '11 at 16:19
  • @chown: I hope you can find a less 'hacky' way to go about it, but I did manage to get the mangling without hard coding it in case it changes in the future -- however unlikely that is. Since the mangling is created by the compiler itself I don't know if you'll find a better way to do it. – Eryk Sun Oct 16 '11 at 18:04