1

One of Python object methods which don't return the modified object is the .add() method of Python set(). This prevents chaining multiple calls to the method:

S = set()
S = S.add('item1').add('item2').add('item3')

giving:

AttributeError: 
    'NoneType' object has no attribute 'add'

Why I tend to prefer usage of chaining .add()s over usage of .update() or union() or the | operator? Because it is a clear self-explaining code which mimics spoken language and therefore best suited for private use by occasional programmers where readability of own code from the time perspective is the main issue to cope with.

A known to me work-around to make above chaining possible is to overwrite set methods. I have coded for this purpose the class chainOfSets. With this class I can write:

S = set()
S = chainOfSets(S).add('item1').add('item2').add('item3').get()
print(S) # gives: {'item1', 'item3', 'item2'}

My question is:

Is there a better approach to allow chaining of object methods which don't return the object they manipulate as using an own class (e.g. chainOfSets, chainOfLists, chainOfPandas, etc)?



Below the chainOfSets class with implemented + operator:

class chainOfSets: 
    """ 
    Allows chaining (by dot syntax) else not chainable set() methods  
    and addition/subtraction of other sets. 
    Is doesn't support interaction of objects of this class itself as 
    this is considered to be out of scope of the purpose for which this 
    class was created.  
    """
    def __init__(s, sv=set()):
        s.sv = sv
    # ---
    def add(s, itm):
        s.sv.add(itm)
        return s
    def update(s, *itm):
        s.sv.update(itm)
        return s
    def remove(s, itm):     # key error if not in set
        s.sv.remove(itm)
        return s
    def discard(s, itm):    # remove if present, but no error if not
        s.sv.discard(itm)
        return s
    def clear(s):
        s.sv.clear()
        return s
    # ---
    def intersection(s, p):
        s.sv = s.sv.intersection(p)
        return s
    def union(s, p):
        s.sv = s.sv.union(p)
        return s
    def __add__(s, itm):
        if isinstance(itm, set): 
            s.sv = s.sv.union(itm)
        else: 
            s.sv.update(itm)
        return s
    def difference(s,p):
        s.sv = s.sv.difference(p)
        return s
    def __sub__(s, itm):
        if isinstance(itm, set): 
            s.sv = s.sv - itm
        else: 
            s.sv.difference(set(itm))
        return s
    def symmetric_difference(s,p): 
        # equivalent to: union - intersection
        s.sv = s.sv.symmetric_difference(p)
        return s
    # ---
    def len(s):
        return len(s.sv)
    def isdisjoint(s,p):
        return s.sv.isdisjoint(p)
    def issubset(s,p): 
        return s.sv.issubset(p)
    def issuperset(s,p):
        return s.sv.issuperset(p)
    # ---
    def get(s):
        return s.sv
#:class chainOfSets(set) 

print((chainOfSets(set([1,2,3]))+{5,6}-{1}).intersection({1,2,5}).get())
# gives {2,5}

Claudio
  • 7,474
  • 3
  • 18
  • 48
  • 1
    You can use a third-party library such as [python-chain](https://github.com/delfick/python-chain) to do this for you. Otherwise, it seems like there's no good solution. – Michael M. Sep 13 '22 at 00:22
  • 7
    Quote from Python's creator Guido van Rossum - "I find the chaining form a threat to readability; it requires that the reader must be intimately familiar with each of the methods. The second form makes it clear that each of these calls acts on the same object, and so even if you don't know the class and its methods very well, you can understand that the second and third call are applied to x (and that all calls are made for their side-effects), and not to something else." - https://mail.python.org/pipermail/python-dev/2003-October/038855.html – matszwecja Sep 13 '22 at 00:23
  • 3
    Oh, and also `set()` default parameter value is gonna be a problem sooner or later - ["Least Astonishment" and the Mutable Default Argument](https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument) – matszwecja Sep 13 '22 at 00:30
  • 3
    Do you need `S` to be modified as a side effect? You could simply write `S = set(['item']).symmetric_difference({'item2'}) | set({'item3'})`. Python is quite deliberate about either mutating an object or returning a value, and not doing both in the same operation. – chepner Sep 13 '22 at 01:14
  • @chepner: interesting is that doing it as you propose changes the 'order' of elements put in set S. How can it be? It seems to go from right to the left `{'item3', 'item2', 'item1'}`, not from left to the right or at least being {'item3', 'item1', 'item2'} as I would expect it. It's a bit weird. It's not chaining. – Claudio Sep 13 '22 at 01:33
  • `S = set(['item1']) | set(['item2']) | set(['item3'])` gives: {'item2', 'item1', 'item3'} . It's also not chaining as the pipe is not a pipe ... Now I am fully confused again. – Claudio Sep 13 '22 at 01:38
  • Sets are unordered; whatever you see in its string representation or while iterating over it is an arbitrary decision by the interpreter, not necessarily related to the order in which you add items to the set. – chepner Sep 13 '22 at 11:59
  • Yes, sets are unordered. But the arbitrary decision seems in the Python version I use be always the same (implementation detail one shouldn't rely on?), so another order can be considered to be a **hint** that same things done with different approaches work not the same way - right or wrong? – Claudio Sep 13 '22 at 12:53
  • @Claudio: The arbitrary ordering seems the same until: 1) You construct the same `set` values with a different sequence of insertions and deletions (`{-2, -1}` and `{-1, -2}` are an example where they're the same values, but iterate differently, relying on a CPython implementation detail), or 2) You construct the `set` using anything that is affected by Python hash randomization (e.g. `str`, `datetime.datetime`, `bytes`-like types, etc.). – ShadowRanger Sep 13 '22 at 18:36
  • Side-note: Your `copy` method is inherently broken if used the way normal copy methods are used. `s1 = chainOfSets({'a'})`, `s2 = s1.copy()`, `s2.add('b')` will modify what is seen in both `s1` and `s2` (because you copied the contained `set` and replaced it, but didn't copy the `chainOfSets`, so `s1` and `s2` are aliases to the same object (if you'd passed an existing `set` to the original `chainOfSets` constructor calling `copy` dissociates the one in the `chainOfSets` from the `set` passed, but that's not enough). – ShadowRanger Sep 13 '22 at 18:43
  • If for some reason this terrible design was absolutely necessary, you could always subclass `set` and save the hassle of reimplementing methods you don't need to change (e.g. `is*` methods, `__len__`, `pop`, etc.). You'd need to manually override all methods you do want to change (no relying on `__getattr__`, because by inheriting the names would be defined and `__getattr__` would never be invoked), but it would be pretty simple (`def methname(self, /, *args, **kwargs): super().methname(*args, **kwargs); return self`). Downside: If you're not careful, you inherit some methods without wrapping. – ShadowRanger Sep 13 '22 at 19:19

2 Answers2

1

Write Python, not some other language

You can make this work with a lot of effort. You shouldn't though. Python has pretty firm rules on methods of built-in types:

  1. If an apparently mutating method on an instance of X always returns an instance of X, it is creating a new modified instance and leaving the original instance unchanged
  2. All apparently mutating methods that modify in-place return either None (most common case) or something that is not (typically, aside from nested container cases) an instance of X (seen with stuff like the pop methods of set and dict)

These rules exist in part because Guido van Rossum (the creator of Python) finds arbitrary method chaining ugly and unreadable:

I find the chaining form a threat to readability; it requires that the reader must be intimately familiar with each of the methods. The [line-per-call] form [of the example code] makes it clear that each of these calls acts on the same object, and so even if you don't know the class and its methods very well, you can understand that the second and third call are applied to x (and that all calls are made for their side-effects), and not to something else.

Experienced Python programmers come to rely on these rules. Your proposed class intentionally violates the rules, in an effort to make idioms from other languages work in Python. But there's no reason to do this. For simple stuff like chained adds, just use update/|= or union/| (depending on whether you want to make a new set or not):

S = set()

# In-place options:
S.update(('item1', 'item2', 'item3'))
# or
S |= {'item1', 'item2', 'item3'}

# Not-in-place options
S = S.union(('item1', 'item2', 'item3'))
# or
S = S | {'item1', 'item2', 'item3'}

All of those are perfectly simple, fast, and require no custom types.

In basically every case you'll encounter in the real world, where you truly want to chain multiple unrelated methods that can't be applied as a single bulk method as in this case, your proposed a class would save you a line or two (if you really insist on compacting it all into a single line, you can always separate calls with semicolons on the same line, or fake it as a single expression by making a tuple from all the call results that begins or ends in the original object and indexing it so the expression evaluates to said object; it's no worse than what you're trying to do with a custom class). But it would be slower (simply wrapping as in your question adds some overhead; dynamic wrapping via __getattr__ as in your answer is much more expensive), uglier, and unidiomatic. Code gets read more often than it's written, and it's frequently read by people who are not you; chasing maximum succinctness at the expense of writing code that introduces unnecessary new types that violate the idioms of the language they're written in helps no one.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • If I have properly understand what you say you vote for deleting the question as it does not fit into main Python philosophy? – Claudio Sep 13 '22 at 20:18
  • Your answer is off-topic. The question is: *"Is there a better approach to allow **chaining** of object methods."* and your answer is: don't use chaining ... – Claudio Sep 13 '22 at 21:24
-1

Is there a better approach to allow chaining of object methods which don't return the object they manipulate as using an own class (e.g. chainOfSets, chainOfLists, chainOfPandas, etc)?

A better approach as this one provided in the question is to write a short general class working for all Python objects and their methods instead of writing voluminous separate class for each single object kind.

The core of the mechanism making it possible in Python is to utilize the fact that calls to object/class methods go, in case when the preliminary (proxy) __getattribute__ method fails, through a __getattr__ method. Overwriting this method is sufficient to intercept and forward the calls to their proper destination.

The code below ( named chainPy, chainObj or objProxy to mirror what is will be used for) does the 'trick' of intercepting method calls, forwarding them to the right destination and checking their return value. The class always memorizes either the return value or the modified object and returns itself for the next use in chain. At the end of the chain the final result is then retrieved with the .get() method of the class:

Important Note: the purpose of chainPy is to help chain object methods which modify the object inplace and return None, so it should be only ONE chainPy object and ONE identifier used in code to avoid side-effects with e.g. the copy() method. The final link in the chain should be .get() and the chainPy object shouldn't be reused later on (thanks to ShadowRanger for pointing this out in comments).

class chainPy:
    def __init__(s, pyObj):
        s._p = pyObj
    def __getattr__(s, method_name):
        def method(*args, **kwargs):
            print(f"chainPy<class>: forwarding: '{method_name}' with {args=} {kwargs=} for pyObj={s._p}")
            bckp_p = s._p
            s._p = getattr(s.p, method_name)(*args, **kwargs)
            if s._p is None:
                s._p = bckp_p
            return s
            # return getattr(s._p, method_name)(*args, **kwargs)
        return method
    def get(s):
        return s._p
# (a proxy is a class working as an interface to something else) 
chainObj = objProxy = chainPy
#:class chainPy

Using the class above the following code runs as expected successfully chaining multiple set.add() calls:

S = set()
S = chainPy(S).add('item1').add('item2').add('item3').get()
print(S) # gives: {'item2', 'item1', 'item3'}
Claudio
  • 7,474
  • 3
  • 18
  • 48
  • 1
    You're doing a lot more than implementing `add` here though, and many of the methods you delegate to should not be returning `s`. `pop` returns something that's (usually) not `None`, so you'll happily allow it to be called and then fail if you chain further (replacing `.p` with the value popped). Same goes for all `is*` methods (which will replace `.p` with a `bool`). `copy` has aliasing issues because it doesn't obey the normal `copy` rules. And this won't handle any overloaded operators (special methods like `__sub__` and the like bypass `__getattr__`). – ShadowRanger Sep 13 '22 at 18:51
  • 1
    "Chaining methods which return a value doesn't make any sense so it's ok if it fails." Yes, but if nothing else, your `__getattr__` should whitelist methods it makes sense to expose. Providing `pop`/`isdisjoint`/`copy` while having them do something insane is fine for cheap hackery, but you should never use this in any code that must be maintained; eventually, someone *will* inadvertently use such a class outside it's original intended use case (e.g. someone forgets to call `.get()` at the end of a chain, and it doesn't break anything until `.pop()` is called on the result later). – ShadowRanger Sep 13 '22 at 19:26
  • 1
    Oh, and one minor note: Unless your goal is to enable people recovering from misuse, there's no reason for `bckp_p` to be an attribute. Just make it a local variable; it only has meaning for the length of the `__getattr__` call anyway. Having a variable set of attributes per-instance is generally a bad idea (it's frowned upon as unPythonic, and modern CPython punishes it because it breaks the key-sharing dictionary optimization causing per-instance memory overhead to increase significantly). – ShadowRanger Sep 13 '22 at 19:27