1

As I understand it, the point of having a += method on mutable Sets, is that

val s = collection.mutable.Set(1)
s += 2  // s is now a mutable Set(1, 2)

has an analogous effect to

var s = Set(1) // immutable
s += 2  // s is now an immutable Set(1, 2)

If that's so, why does the += method on the mutable Set return the Set itself? Wouldn't that make code more difficult to refactor e.g.

val s = collection.mutable.Set(1)
val s1 = s += 2  // s and s1 are now mutable Set(1, 2)

can't be refactored to

var s = Set(1) // immutable
var s1 = s += 2  // s is immutable Set(1, 2), s1 is now ()

while maintaining the original meaning. What's the reason behind this design decision?

joel
  • 6,359
  • 2
  • 30
  • 55

2 Answers2

2

(This is obviously just a guess)

Mutable Scala collections were intended to be used as Builders for immutable collections. There is one very prominent and very ancient immutable data structure on the JVM: the String. The corresponding builder (not tied to Scala in any way) existed already in Java: it was the StringBuilder. If you look into documentation, you will see dozens of versions of an overloaded method append. Every time, it returns the StringBuilder itself, which allows you to write the following:

// java code
myBuilder
  .append('h')
  .append('i');

I guess that the Scala collection.mutable API simply imitated the behavior of Java's StringBuilder, but replaced the append(...) by a somewhat shorter +=. In the end of the day, it's just an implementation of the classic builder pattern.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
  • Unfortunately s += 2 += 3 += 4 += 5 raises a compiler error for immutable Sets, again breaking the equivalence – joel Aug 24 '18 at 20:16
  • 1
    @JoelBerkeley It doesn't matter all that much. Mutable collections and immutable collections are totally different kinds of beasts. Immutable collections are "data"-like messages that can fly between threads and machines across the network. Mutable collections are "state-monad"-like close-to-the-metal-thingies that model the mutable memory of a single computing node, for some efficient algorithms that run faster with mutation. The sets of scenarios where those two kinds of collections can be applied are essentially disjoint. – Andrey Tyukin Aug 24 '18 at 20:19
  • @JoelBerkeley Ok, maybe not "disjoint" - there are quite a few algorithms that can be implemented in both styles, or even require a mixture of both. But still, those families of data structures are different enough that it wouldn't matter all that much if they had vastly different interfaces. – Andrey Tyukin Aug 24 '18 at 20:22
  • OK. I'm trying to work out the basis for the 1/2 page in _Programming in Scala_ on "The choice of the method names += and -= means that very similar code can work with either mutable or immutable sets." – joel Aug 24 '18 at 20:24
  • @JoelBerkeley "Very similar" - what's "very similar"? It does work "similarly" in the `lhs += singleElement` case. Does that count as "similar" enough? It definitely does not feel like the mutable collection library was inherited from C, whereas the immutable collection library was inherited from Haskell. Those two parts of the API are *rather similar*, I'd dare to say... – Andrey Tyukin Aug 24 '18 at 20:27
  • Yeah that's the case they mean. I was curious why they didn't intend to carry it to its full conclusion - what other factors took priority, which is where your answer comes in – joel Aug 24 '18 at 20:30
  • The immutable version results in `Unit` because [assignment returns `Unit`](https://stackoverflow.com/questions/1998724/what-is-the-motivation-for-scala-assignment-evaluating-to-unit-rather-than-the-v). That's constraint Nr1. Above, I've tried to sketch why it might make sense to make `+=` return the collection itself in the mutable case. That's constraint Nr2. The two parts of the library should behave "similarly". That's constraint Nr3. Looks like "You can have only [2 out of 3](https://en.wikipedia.org/wiki/CAP_theorem)"-situation. – Andrey Tyukin Aug 24 '18 at 20:30
  • @JoelBerkeley Like [this here](http://destylio.com/blog/wp-content/uploads/2016/04/GOOD-FAST-CHEAP-01.jpg), but with 1) " `=` must return `Unit`", 2) " `+=` must return collection itself", 3) "mutable / immutable must be as similar as possible". They picked the first two. – Andrey Tyukin Aug 24 '18 at 20:35
  • @AndreyTyukin "_must_ return Unit"? _must_? I don't think so. I looked at a link you suggested to justify that assignments return units, but that seems to be as "valid", as say, disallowing a single-statement functions (why push on stack something, that you could just use inline?). An optimizer could be trivially taught to not push a value on stack if it's not going to be picked up, I don't think it has any bearing on the actual language semantics. – Dima Aug 24 '18 at 22:23
  • @Dima As I understood it, if assignment returned the assigned value, then every setter `def setFoo(newFoo: Foo) = { privateFoo = newFoo }` would have to be rewritten with a `; Unit }` in the end, so that a `Unit` instead of `newFoo` is returned. Alternatively, one could let the setters return `newFoo`, but then try to optimize it away later. In this [comment](https://stackoverflow.com/questions/1998724/what-is-the-motivation-for-scala-assignment-evaluating-to-unit-rather-than-the-v#comment1921878_2000517) Odersky claimed that this would have been "pain". Avoiding pain sounds like a good reason – Andrey Tyukin Aug 24 '18 at 22:35
  • @AndreyTyukin I think it is fairly reasonable to claim that writing a compiler as complex and advanced as scala's is a "pain" in and of itself. Why don't everyone just stick with "C"? :) In general, I just don't see how this would be a problem at all. Pushing/popping a single value on stack isn't _that_ expensive, and mutations are rare enough in a functional language to begin with. I mean, we are willing to live with `:+` being O(n), and with `.filterNot(_ == null)` making a copy of entire collection, but not with pushing an extra value on stack? C'mon! :D – Dima Aug 25 '18 at 11:18
  • Also, you don't need `; Unit` for the setters. Just a type ascription would doif you are really that concerned about that issue ... Type ascriptions are highly recommended for public methods, and I just don't see at all how optimizing the `push` away would be "a pain" for the private ones ... seems pretty trivial. – Dima Aug 25 '18 at 11:21
  • @Dima You're saying that mutations are "rare enough", the linked answer says that the returned value is discarded "95% of the time", so we are talking about the prevalence `5% * rareEnough`. And even in those super-rare `while((line = read()) != null)`-cases, the fact that assignment returns `Unit` does almost no harm, because it can be fixed with 1-2 extra lines of code... So, I'd say it's mostly bike-shedding, and I actually don't even have any strong opinion about return type of `=` ;-P. Maybe OCaml `let x = ref 0;; x := 1;;` returning `- : unit = ()` was the inspiration? – Andrey Tyukin Aug 25 '18 at 11:40
  • No, we are not talking about "prevalence of 5%*rareEnough", but rather about the cost of "95%*veryCheap*rareEnough" :). Sure, it can be "fixed with a few lines of code" ... almost everything can. We could just write "C". Everything can be written in it, and will work much faster too ... It would also just take some extra code, but not more than just a few extra lines here and there :) – Dima Aug 25 '18 at 14:28
  • @Dima No. Not "everything can be written" in C. For the past 20-30 years or so, at least in the realm of statically typed languages, it's not about what programs can **do at runtime**, it's about what **the programmer can prove about their correctness at compile-time**. If a language can do everything a Turing machine (with stdI/O) can do, but does not allow me to encode sophisticated coherence constraints (e.g. in form of types), then I simply don't care. C is nowhere as expressive as Scala in this regard. That Turing-tarpit argument shouldn't be invoked any more, it should be deprecated. – Andrey Tyukin Aug 25 '18 at 14:46
  • @Dima Fixing `while ((line = read()) != null)` with an additional `var` requires `O(1)` additional lines of code (written and maintained in `O(1)` files by `O(1)` programmers). That's negligible. On the other hand, writing a library that provides type-safe immutable `List` in `C` is not just `O(exp(N))` or `O(gamma(N))` or `O(anythingHorrible(N))` overhead - it's completely impossible without writing a countably infinite number of `ListTypeNumberN`-classes, whereas in Scala this can be done trivially in finite number of lines. – Andrey Tyukin Aug 25 '18 at 14:49
  • @Dima ...I'd therefore rather worry about things [like this](https://stackoverflow.com/q/51974633/2707792), which make the whole type system "somewhat broken". Little syntactical annoyances don't seem all that important, as long as the type system contains weird constructs and edge cases that make it unsound... Aand, by the way: it's not the first time that I notice that we both seem to like to have the last word in lengthy comment discussions. So I guess I better stop now by agreeing that there are valid arguments for `=` returning the assigned value :) – Andrey Tyukin Aug 25 '18 at 14:52
  • 1
    @AndreyTyukin Nah, who needs "type safe immutable list", that's _way_ more expensive, than pushing an extra value on stack every now and then. :) All you need is O(1) extra code for `malloc` and `free` ... – Dima Aug 25 '18 at 15:25
1

In the immutable case, s += 2 is the same as s = s + 2. So if you make s += 2 evaluate to the new value of s, then you kind of have to make every assignment statement evaluate to the result of the assignment. Other languages do it this way, but it has historically led to bugs, famously with C code like:

if (x = 0) {
  ...
}

So I think it makes sense not to have that return the set.

On the other hand, for the mutable case, += is just a method name, so it doesn't do assignment and can't really be responsible for this kind of bug in a meaningful way. And having it return itself enables the kind of builder pattern chaining that is sometimes useful.

Joe K
  • 18,204
  • 2
  • 36
  • 58
  • `if (x = 0) { .. }` is a problem because `int` is somehow compatible with `boolean`, not because an assignment evaluates to its result. The latter, I think, is a perfectly useful feature. – Dima Aug 24 '18 at 22:18