8

When migrating from a C# world to an F# (the most idiomatic possible) mindset, I've found this interesting difference.

In C#'s OOP&mutable world, the default set collection seems to be HashSet, which seems to be not sorted by default (because the comparer that it accepts is just for equality); while if you wanted a sorted one you would have to use SortedSet.

However in F#'s world, the basic set is already sorted because it requires the element type used to implement equality and comparison. Any specific reason for this? Why not having an unordered set in the main collections for this language?

As a side-note, I'm wondering if it'd be possible to have a set collection that didn't allow duplicates, but that had a preference over certain elements when discarding some elements as duplicates. Example: a record { Name: string; Flag: Option<unit> } so that when inserting { Name = "foo"; Flag = None } and later { Name = "foo"; Flag = Some() } it ended up containing only the latter element (because Flag is present).

knocte
  • 16,941
  • 11
  • 79
  • 125
  • 2
    C# doesn't have a default set collection. HashSet is more common simply because scenarios that don't need sorting are more common - lookups, additions, set operations. HashSet's performance is O(1) in those cases while SortedSet and F#'s set is `O(logN)`. – Panagiotis Kanavos Aug 07 '19 at 12:19
  • 1
    You should probably be asking why F# added its own `set` type but not something like `HashSet`. It's not only about immuitability after all - `O(1)` vs `O(logN)` lookups is a significant difference – Panagiotis Kanavos Aug 07 '19 at 12:24
  • 1
    `You should probably be asking why F# added its own set type but not something like HashSet`. this is what I asked, certainly – knocte Aug 07 '19 at 12:35
  • 3
    https://stackoverflow.com/questions/16216439/whats-the-difference-between-immutablesortedset-and-fsharp-set and https://stackoverflow.com/questions/28420373/why-does-f-set-need-icomparable may be worth a read. The fundamental distinction here appears to be mutability - `HashSet` is designed to be mutable, and thus its implementation optimises towards that scenario. F# does not. – mjwills Aug 07 '19 at 13:09
  • 1
    Stackoverflow .... sigh. Anyway, Set in F# is based on a red black tree which is implicitly sorted on the key. Red black trees are reasonably simple to implement as immutable structures and well understood. In addition; only `<` is needed to implement a red black tree with the assumption that `not x < y and not y < x` means that the values are equal. Set is actually quite fast (O notation don't tell the full story) but will do worse than `Dictionary` because sets are immutable and F# generic `<` operator is slow. As for immutable hashsets there are variants based on hash tries that are decent. – Just another metaprogrammer Aug 07 '19 at 13:14
  • thanks @mjwills, the second link kinda replies to my first question; I guess my side-note question is impossible to do without implementing a custom collection right? – knocte Aug 07 '19 at 13:16
  • 1
    @Justanothermetaprogrammer that's not an adequate explanation of why F# doesn't have a hash set. HashSet is still faster. Both C# and F# collections are generic and the comparison is performed by the runtime anyway, so there's no difference in the two implementations. Immutability matters for another reason - both SortedSet (a red-black tree itself) and HashSet's set operations modify the target object. F#'s operations though have to produce a *new* set. But HashSet is still faster of inserstions so .... there's probably another reason, like storage – Panagiotis Kanavos Aug 07 '19 at 13:18
  • 1
    No you can't do that because that would confuse users that expect a certain behavior from `add` however, you can implement your own method `merge` that merges like that and put in a Set module for convenience. – Just another metaprogrammer Aug 07 '19 at 13:18
  • 1
    I kind of ran out of space in the comment (because I can't answer the question) so I had to edit some of my other points so my answer was incomplete. Personally I believe it's convention in most OCaml languages that the default set is sorted but I don't know 100%. – Just another metaprogrammer Aug 07 '19 at 13:19
  • 1
    `I guess my side-note question is impossible to do without implementing a custom collection right?` Yes, that is correct. Not hard to write (and a bit weird), hence likely why probably not provided out of the box. – mjwills Aug 07 '19 at 13:20
  • 2
    @knocte Petricek's first bullet is the actual answer - HashSet uses a buffer internally just like a list so "removing" or adding an items is *fast* as long as you don't have to reallocate the buffer - just write something to a free spot, or "unmark" it. Set operations that produce a *new* (hash)set though are probably more expensive than walking two trees, the way F#'s set or SortedSet do. Check for example F#'s intersect implementation, [intersectAux](https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/set.fs#L317). Essentially, it walks both trees at the same time – Panagiotis Kanavos Aug 07 '19 at 13:21
  • 1
    Anyway, for more open ended questions ask the questions on quora.com. They won't get closed – Just another metaprogrammer Aug 07 '19 at 13:22
  • @mjwills F# can use any .NET collection. F# itself *uses* HashSets internally [as this Github search shows](https://github.com/fsharp/fsharp/search?q=hashset&unscoped_q=hashset), eg to implement [Seq.distinct](https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/seq.fs#L1088), [Array.distinct](https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs#L252) and [List.except](https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/list.fs#L422) – Panagiotis Kanavos Aug 07 '19 at 13:33
  • @mjwills as a core F# type,no. That's what this question asks. `Any specific reason for this? Why not having an unordered set in the main collections for this language?` The reason is probably far more complex than `it's fast enough` or `because OCaml doesn't have one`. That's something only people like Petricek can answer though – Panagiotis Kanavos Aug 07 '19 at 14:10
  • 1
    Close voters, unless you're named Don Syme or Tomas Petricek **wait**. This can be a great question with a great answer – Panagiotis Kanavos Aug 07 '19 at 14:13
  • 2
    I think the mistake was labeling the question with the `.net` tag; if you look at the close-voters the majority of them are C# people; while in general when I make F# questions and only tag `f#` most of my questions are very welcome and I get such a big amount of quality responses; my hat off to the F# community – knocte Aug 07 '19 at 14:46
  • Great question. Vote to reopen. – s952163 Aug 08 '19 at 03:14
  • @knocte but it's a .net question too. :D – s952163 Aug 08 '19 at 05:49
  • 3
    Possible duplicate of [Why does F# Set need IComparable?](https://stackoverflow.com/questions/28420373/why-does-f-set-need-icomparable) – knocte Aug 08 '19 at 07:31
  • 2
    Almost a duplicate of [Why does F# Set need IComparable?](https://stackoverflow.com/questions/28420373/why-does-f-set-need-icomparable), but IMHO not an *exact* duplicate. – rmunn Aug 08 '19 at 07:32

1 Answers1

4

F# Set happens to be sorted, but it's more of an implementation detail resulting from the choice of the underlying data structure and should not generally be relied upon.

F# sets and maps are based on a variant of AVL tree and that structure happens to maintain the invariant that elements stored in the tree are sorted. The reason why it requires comparison constraint is because lookup in this tree structure depends on direct comparisons between elements to select the subtree that gets traversed.

The selling point of these structures however is the fact that they can be used to implement reasonably efficient, immutable versions of maps and sets cheaply, and that's what F# needed at a time where the wider .NET platform didn't offer any alternatives.

Note that this is not the only viable choice in this context and JVM functional languages like Clojure or Scala opted for a different data structure as the base for their maps - hash array mapped trie - which is also immutable and persistent, arguably more complex to implement, arguably more efficient for larger collection sizes, but happens to store elements unordered. Unlike AVL trees, the traversal of the tree is based on hashes, so comparison constraint is not required.

So if you already know that your priority is immutability, a sorted set is actually easier to implement than an unsorted set.

scrwtp
  • 13,437
  • 2
  • 26
  • 30