14

Note: I'm tagging this Python and C++ because I've seen examples in both, but the question is language-agnostic.

A function or class method that modifies an object has two choices: modify the data directly in the object in question, or create a new copy and return it while leaving the original untouched. Generally you can tell which is which by looking at what's returned from the function.

Occasionally you will find a function that tries to do both, modify the original object and then return a copy or reference to that object. Is there ever a case where this provides any advantage over doing only one or the other?

I've seen the example of the Fluent Interface or Method Chaining that relies on returning a reference to the object, but that seems like a special case that should be obvious in context.

My first bad example comes straight from the Python documentation and illustrates the problem of mutable default parameters. To me this example is unrealistic: if the function modifies its parameter then it doesn't make sense to have a default, and if it returns a copy then the copy should be made before any modifications take place. The problem only exists because it tries to do both.

def f(a, L=[]):
    L.append(a)
    return L

The second example comes from Microsoft C++ in the CStringT::MakeUpper function. The documentation says this about the return value:

Returns a copy of the string but in all uppercase characters.

This leads one to expect that the original remains unchanged. Part of the problem is that the documentation is misleading, if you look at the prototype you find that it's returning a reference to the string. You don't notice this unless you look closely, and assigning the result to a new string compiles with no error. The surprise comes later.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • I'm not sure this question has an answer other than people's opinions. My opinion is strongly no, it doesn't make sense. And your comment on the Python's mutable default argument problem example is something I've been saying for a while; the real problem isn't with the mutable default argument, but that the argument (which may still be used externally) is modified in-place. – Ben Oct 09 '12 at 22:53
  • @Ben, it could be answered by a good counter-example. – Mark Ransom Oct 09 '12 at 22:54
  • I'm calling BS on the Python example. It mutates its argument and returns it, but the problem also occurs with functions not returning anything - it won't return the same object over and over again, but mutations by the function body are still preserved, and after all the problem is this unexpected preservation. And this happens as soon as mutable state is shared more than you intend/expect (e.g. by mutable default arguments), no matter if it's unintentionally shared even more by returning it. –  Oct 09 '12 at 22:55
  • To give an example (not of the actual topic but of mutable default arguments), my code base has numerous visitor-ish traversals of (cyclic) object graphs which use a "seen" `set` to break cycles. In every case (well, except two whose purpose is creating a set of nodes), the seen set is internal and nothing is returned from the visitor actions. Yet if I was to give actions a `seen=set()` default parameter to avoid specifying the empty set when invoking the action for the first node, I would get wrong results on every run after the first one, because the internal state was accidentially shared. –  Oct 09 '12 at 23:02
  • 1
    @delnan, a function isn't expected to modify its argument unless it's explicitly documented as doing so, thus the default argument is never modified. If modifications are necessary then a copy should be made regardless of whether the default is used or not. – Mark Ransom Oct 09 '12 at 23:04
  • @delnan I like functional programming too, so perhaps I'm a little out of step with imperative programming conventions (although I work with Python code every day at work and I'm well aware of the number of difficult bugs caused by accidentally mutating function arguments). But I *would* argue that a function should **never** modify its arguments **unless** it is clear from the name/documentation/etc that modifying the argument is the *point* of calling the function. – Ben Oct 09 '12 at 23:08
  • The only thing I can say is that I've cursed many many times about Python's sort method (and other similar ones) not returning anything, thereby forcing me to create names for temporary vectors. I know it sorts in place. The language doesn't really have to remind of that by adding constant inconvenience. – rici Oct 09 '12 at 23:08
  • @MarkRansom and Ben: For a lot of functions, mutating an argument is their primary purpose (and of course for many others it's not), and they *are* documented to do so. Consider half of the methods on mutable collections, for instance. And in other cases (such as my example), mutation is one of a few purposes of the functions, but not relevant to all callers, so it's made optional (and hence supplying a default). Making a copy of default arguments only makes sense if you know that default arguments are shared, if you assume they are created anew for every call, of course it doesn't occur. –  Oct 09 '12 at 23:09
  • 4
    @rici you need to check out `sorted`. – Mark Ransom Oct 09 '12 at 23:09
  • @delnan, and as I stated in the question if a function is intended to mutate an argument then it shouldn't have a default. Although your example with `seen` might make a good counter-argument I don't think it's strong enough. – Mark Ransom Oct 09 '12 at 23:12
  • @MarkRansom, fair point, but that just serves to demonstrate the value of having a method which both mutates and returns a value. Perhaps I'm not understanding your original question. – rici Oct 09 '12 at 23:14
  • @delnan I think that assuming argument defaults are created anew is only a problem if you're used to C++ or another language that creates a copy for you automatically. – Mark Ransom Oct 09 '12 at 23:14
  • 2
    @rici, no it's exactly the opposite - there's one function to mutate in-place and another to return a modified result. You don't have one function that does both. – Mark Ransom Oct 09 '12 at 23:15
  • @MarkRansom I would say "shouldn't have a default" and "shouldn't use the Python shortcut which leads to dangerous sharing" are two different things. I know there are some cases (and I suspect there are more) where the most convenient API may accept an object and mutate it, but still makes sense without it (hence the caller isn't required to pass it). Of course you can't use Python's convenient syntax for it directly, but if you say `def f(seen=None): if seen is None: seen = set()` you're doing the same thing, just less pretty (and more correct). –  Oct 09 '12 at 23:16
  • Personally I don't like the Python behavior, and would rather see what you've suggested. This Python 'feature' has tripped me up in the past a few times too. I think it's important to note, however, the mindset in which this functionality was devised: You're not specifying [] to be the default argument in calling the function, but to be the default argument in defining the function. With that understanding, this behavior makes perfect sense. It's really uncomfortable , but it still makes sense. – Jonathan Vanasco Oct 09 '12 at 23:17
  • @MarkRansom "I think that assuming argument ..." -- Possibly, but not important. The reasons for assuming defaults are not shared don't matter in that they don't fix any bugs and headaches caused by the wrong assumption, can't really help eliminating the problem, etc. - it's a problem no matter what causes it. –  Oct 09 '12 at 23:18
  • IMHO and as other have said, this kind of design decision should take in consideration **The Principle of least astonishment** (https://en.wikipedia.org/wiki/Principle_of_least_astonishment) and even if there is some special cases where this can't be applied we should remember that `Special cases aren't special enough to break the rules.` -- Python Zen – mouad Oct 09 '12 at 23:19
  • Ruby does one worse, it conditionally gives you a copy: http://www.ruby-doc.org/core-1.9.3/String.html#method-i-capitalize-21 – Tom Kerr Oct 09 '12 at 23:30

2 Answers2

4

C++ Example Inc/Dec operator

// Pre-Increment: Create a new object for return and modify self.
myiterator  operator++(int) {myiterator tmp(*this); operator++(); return tmp;}


// Post-Increment: modify self and return a reference
myiterator&  operator++() {/* Do Stuff*/ return *this;}
Martin York
  • 257,169
  • 86
  • 333
  • 562
2

There are some obvious examples in C++ where you want to modify the object and return a reference:

  1. Assignment:

    T & T::operator=(T && rhs)
    {
        ptr = rhs.ptr;
        rhs.ptr = nullptr;
        return *this;
    }
    

    This one modifies both the object itself and the argument, and it returns a reference to itself. This way you can write a = b = c;.

  2. IOStreams:

    std::ostream & operator<<(std::ostream & os, T const & t)
    {
        os << t->ptr;
        return os;
    }
    

    Again, this allows chaining of operations, std::cout << t1 << t2 << t3;, or the typical "extract and check" if (std::cin >> n) { /* ... */ }.

Basically, returning a reference to one of the input objects always serves to either chain calls or to evaluate the resulting state in some form or another, and there are several useful scenarios for this.

On the other hand, modifying an argument and then returning a copy of the object appears to be less useful.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 3
    These are good examples, but one could argue that they both fall under the category of "method chaining," which the question excludes as a special case. – senderle Oct 09 '12 at 23:24
  • @senderle: Well... if you're going to discard the return value, then discussing the return type is moot. And if you *are* using the return value, then that's always a type of "chaining", isn't it? – Kerrek SB Oct 09 '12 at 23:33
  • @KerrekSB Not if you can come up with a good use case where a different object is returned. –  Oct 09 '12 at 23:34
  • 1
    Note that `a = b = c` becomes `a = (b = c)`, so returning a copy would do. For chaining, `(a = b) = c` needs the reference. – GManNickG Oct 09 '12 at 23:51