I am wondering what's the difference between ImmutableSortedSet and the native FSharp Set?
They are generally very similar. The main difference is that the F# Set
supports fast set theoretic operations (union, intersection and difference).
Here is a simple F# program that measures the performance of some common operations:
open System.Collections.Immutable
while true do
do
let timer = System.Diagnostics.Stopwatch.StartNew()
let cmp = LanguagePrimitives.FastGenericComparer<int>
let mutable s1 = ImmutableSortedSet.Create<int>(cmp)
let mutable s2 = ImmutableSortedSet.Create<int>(cmp)
for i in 1..1000000 do
s1 <- s1.Add i
for i in 1000000..2000000 do
s2 <- s2.Add i
printfn "BCL ImmutableSortedSet: add in %fs" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..10 do
for i in 1..1000000 do
ignore(s1.Contains i)
printfn "BCL ImmutableSortedSet: contains in %fs" timer.Elapsed.TotalSeconds
timer.Restart()
let s = s1.Union s2
printfn "BCL ImmutableSortedSet: union in %fs" timer.Elapsed.TotalSeconds
do
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable s1 = Set.empty
let mutable s2 = Set.empty
for i in 1..1000000 do
s1 <- s1.Add i
for i in 1000000..2000000 do
s2 <- s2.Add i
printfn "F# Set: %fs" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..10 do
for i in 1..1000000 do
ignore(s1.Contains i)
printfn "F# Set: contains in %fs" timer.Elapsed.TotalSeconds
timer.Restart()
let s = Set.union s1 s2
printfn "F# Set: union in %fs" timer.Elapsed.TotalSeconds
On my machine, I get:
BCL ImmutableSortedSet F# Set
add 2.6s 3.0s
contains 2.1s 1.9s
union 1.1s 0.00004s
So the F# Set
is slightly slower to construct and slightly faster to search but orders of magnitude faster for the set theoretic union operation.
What is the internal implementation of fsharp map? Is is Red Black Tree as claimed here or AVL tree as found out here?
As both of your links state, F# uses AVL trees.
This is actually relevant in the context of the performance figures above. AVL trees contain the maximum height of a subtree in each branch and, therefore, allow subtrees to be rebalanced without examining the entire subtree. In contrast, red-black trees contain a single bit of data in each branch so rebalancing subtrees requires the entire trees to be traversed which is asymptotically slower. In layman's terms, the union of two same-sized non-overlapping sets entails little more than creating a new branch containing the two existing trees. Note that the Union
in the BCL API cannot even express this: it handles an abstract IEnumerable
rather than a concrete set.
In addition, why MSDN documents don't state clear what the actual data structure is for the library collection? I know these are implementation details and are about to change. My point is that if they don't want to bind the library data type to a certain type of well known data structure, they should at least offer a summery of all the methods performance signatures in terms of complexity?
I agree that complexities in the docs would be good.