35

This was quite an unplesant surprise:

scala> Set(1, 2, 3, 4, 5)       
res18: scala.collection.immutable.Set[Int] = Set(4, 5, 1, 2, 3)
scala> Set(1, 2, 3, 4, 5).toList
res25: List[Int] = List(5, 1, 2, 3, 4)

The example by itself suggest a "no" answer to my question. Then what about ListSet?

scala> import scala.collection.immutable.ListSet
scala> ListSet(1, 2, 3, 4, 5)
res21: scala.collection.immutable.ListSet[Int] = Set(1, 2, 3, 4, 5)

This one seems to work, but should I rely on this behavior? What other data structure is suitable for an immutable collection of unique items, where the original order must be preserved?

By the way, I do know about distict method in List. The problem is, I want to enforce uniqueness of items (while preserving the order) at interface level, so using distinct would mess up my neat design..

EDIT

ListSet doesn't seem very reliable either:

scala> ListSet(1, 2, 3, 4, 5).toList
res28: List[Int] = List(5, 4, 3, 2, 1)

EDIT2

In my search for a perfect design I tried this:

scala> class MyList[A](list: List[A]) { val values = list.distinct }
scala> implicit def toMyList[A](l: List[A]) = new MyList(l)
scala> implicit def fromMyList[A](l: MyList[A]) = l.values     

Which actually works:

scala> val l1: MyList[Int] = List(1, 2, 3)
scala> l1.values
res0: List[Int] = List(1, 2, 3)

scala> val l2: List[Int] = new MyList(List(1, 2, 3))
l2: List[Int] = List(1, 2, 3)

The problem, however, is that I do not want to expose MyList outside the library. Is there any way to have the implicit conversion when overriding? For example:

trait T { def l: MyList[_] }
object O extends T { val l: MyList[_] = List(1, 2, 3) }
scala> O.l mkString(" ")  // Let's test the implicit conversion
res7: String = 1 2 3      

I'd like to do it like this:

object O extends T { val l = List(1, 2, 3) }  // Doesn't work
Vilius Normantas
  • 3,708
  • 6
  • 25
  • 38
  • Where could I read about scala's collections in greater depth than "Programming in Scala" and Scaladocs? – Vilius Normantas Mar 09 '11 at 12:33
  • 3
    This is a good place to start learning about Scala collections: http://www.scala-lang.org/docu/files/collections-api/collections.html – pr1001 Mar 09 '11 at 12:42
  • 1
    Sets have by (mathematical) definition no order and most languages stick to that convention. Curious, why would you expect them to have? Order is the only substantial difference between `Seq` and `Set` besides uniqueness! – Raphael Mar 09 '11 at 17:55
  • 1
    In this case I'm not too concerned about the mathematical properties of my data structure. I just need an immutable collection of distinct items with their original order preserved. Call it a list, set, vector, table or "truckload" :) – Vilius Normantas Mar 10 '11 at 11:10

3 Answers3

54

That depends on the Set you are using. If you do not know which Set implementation you have, then the answer is simply, no you cannot be sure. In practice I usually encounter the following three cases:

  1. I need the items in the Set to be ordered. For this I use classes mixing in the SortedSet trait which when you use only the Standard Scala API is always a TreeSet. It guarantees the elements are ordered according to their compareTo method (see the Ordered trat). You get a (very) small performance penalty for the sorting as the runtime of inserts/retrievals is now logarithmic, not (almost) constant like with the HashSet (assuming a good hash function).

  2. You need to preserve the order in which the items are inserted. Then you use the LinkedHashSet. Practically as fast as the normal HashSet, needs a little more storage space for the additional links between elements.

  3. You do not care about order in the Set. So you use a HashSet. (That is the default when using the Set.apply method like in your first example)

All this applies to Java as well, Java has a TreeSet, LinkedHashSet and HashSet and the corresponding interfaces SortedSet, Comparable and plain Set.

Christoph Henkelmann
  • 1,437
  • 1
  • 12
  • 17
  • 3
    If you look at [the source for LinkedHashSet](https://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_8_1_final/src//library/scala/collection/mutable/LinkedHashSet.scala#L1) you see that all it does is mixin the basic Set traits and add a list member which stores the order of the elements. So following that pattern you could simply write your own Immutable LinkedHashSet using the according base traits and classes from the immutable package. – Christoph Henkelmann Mar 09 '11 at 13:27
  • 3
    Insertion order is not a-priori well-defined: What happens if the same element is added twice, i.e. where will the one representant be put? – Raphael Mar 09 '11 at 18:03
  • 1
    Also, relying on the implementation of a given type is not a good idea. You need something that assures to keep order by its interface definition (contract). I can not think of a trivial wrapper around existing immutable collections that does that. You can use `TreeMap[(Int,T)]` (keeping track of insertion order in the first component), but that implies delegating all set methods explicitly. – Raphael Mar 09 '11 at 18:12
  • 1
    The Scala API doc does (regrettably) not mention it, but according to the code, inserting an element again does not change it's position (which is how the Java `LinkedHashMap` behaves as well): `if (addEntry(elem)) { ordered += elem; true } else false`. The Java API is more precise: “Note that insertion order is not affected if a key is re-inserted into the map” – Christoph Henkelmann Mar 09 '11 at 18:13
  • 1
    Sometimes it is essential to rely on the implementation of a given type. When I need a list with constant time prepending, I need to use a linked list, not an array backed list. This is why in such cases I explicitly use the type of the implementation, not an interface or trait. Would I allow, say, an Array backed list to be handed to my code I can get quadratic runtimes. So, my code relies on the implementation of the list. If I want to make clear insertion order has an effect on the behavior of my code, I use LinkedHashSet as the type. If I do not care, I use Set. – Christoph Henkelmann Mar 09 '11 at 19:42
  • But still, you would code against the specification resp assured properties of `LinkedList`, not against the code, right? – Raphael Mar 11 '11 at 19:05
  • You are right, of course I wouldn't do that. `LinkedList` by definition has certain properties I rely on. By explicitly using `LinkedList`, I state that I rely on those Specific properties. IMHO for the present discussion it is OK to rely on the specific properties of the `LinkedHashSet` as long as I explicitly use a `LinkedHashSet`, not a `Set`. The fact that the ScalaDoc does not mention that insertion order is preserved is just because the doc is still sparse in many places. – Christoph Henkelmann Mar 11 '11 at 19:41
14

It is my belief that you should never rely on the order in a set. In no language.

Apart from that, have a look at this question which talks about this in depth.

Community
  • 1
  • 1
Michael Piefel
  • 18,660
  • 9
  • 81
  • 112
  • Thank you, I didn't find the other question.. My bad – Vilius Normantas Mar 09 '11 at 12:41
  • 5
    In C++, `std::set` is guaranteed to be ordered. -1. – Fred Foo Mar 09 '11 at 12:42
  • 5
    @larsmans...for one example? one example that *doesn't* provide the ordering the OP wants (i.e., insertion order)? really? – Carl Mar 09 '11 at 13:01
  • Agree with @larsmans. In C++, std::set is ordered while std::unordered_set is not ordered – Newbie Jan 27 '16 at 01:13
  • 2
    Christoph Henkelmann's answer is the right answer. Uniqueness and sortedness are orthogonal properties and its totally legitimate to want both in a collection class (in any language). – user48956 Jul 14 '16 at 19:04
  • Although I upvoted Christoph’s answer long ago, it’s really not _more correct_ than mine, but rather better in the sense that it’s _more elaborate_. The question, after all, was “can I rely on the order in a `Set`”, and not “can I rely on the order in a `SortedSet`”. – Michael Piefel Jul 18 '16 at 06:31
14

ListSet will always return elements in the reverse order of insertion because it is backed by a List, and the optimal way of adding elements to a List is by prepending them.

Immutable data structures are problematic if you want first in, first out (a queue). You can get O(logn) or amortized O(1). Given the apparent need to build the set and then produce an iterator out of it (ie, you'll first put all elements, then you'll remove all elements), I don't see any way to amortize it.

You can rely that a ListSet will always return elements in last in, first out order (a stack). If that suffices, then go for it.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • I considered ListSet for a while, but there was a problem - I had to reverse the list somewhere to restore the original order. Unfortunately you cannot do that inside immutable trait. And if I convert my trait to a class (which I had to do in the end) I can as easily do `list = providedList.distinct` or just throw an exception on duplicates... – Vilius Normantas Mar 10 '11 at 11:18