1

In F# tuples are reference types. F# also has struct tuples, which are value types. There are speed advantages to using struct tuples. This is demonstrated in the code below.

There may be a reason for slower performance of memoized functions that take a struct tuple as their argument, as compared to taking a reference tuple. A memoized function must check if a key is present in a dictionary. If the key is a reference type this should be a simple reference equality comparison. On the other hand, if the key is a value type and is a "heavy" object then structural equality comparison may be slow. This did not happen in the examples below, but seems to be a theoretical possibility that I was not able to generate.

Question: If a "heavy" tuple is passed as the argument to a memoized function can it be better to have the function accept a standard (reference) tuple as argument instead of a struct tuple? If yes, is the reasoning in the paragraph above a correct explanation for the speed differential?

open System.Collections.Concurrent

let inline memoizeConcurrent f =
    let dict = ConcurrentDictionary()
    fun x -> dict.GetOrAdd(Some x, lazy (f x)).Force()

let time (msg: string) (f: 'S->'T) (x: 'S) : 'T =
    let stopWatch = System.Diagnostics.Stopwatch.StartNew()

    let f_x = f x

    let ms = stopWatch.Elapsed.TotalMilliseconds

    let msg' =
        if ms < 10000.0 then sprintf " - Elapsed time: %f ms" ms
        else sprintf " - Elapsed time: %f seconds" (ms / 1000.0)

    printfn "%s" (msg + msg')

    f_x

let xs = [0.0..10000.0]
let ys = xs |> List.map (fun x -> x + 1.0)

let tup : list<float>*list<float> = (xs, ys)
let tupStruct : struct (list<float>*list<float>) = struct (xs, ys)

let n = 100 // other values of n will be tried

let foo ((x,y): list<float>*list<float>) : unit =
    [1..n]
    |> List.iter (fun _ -> List.map2 (+) x y |> List.average |> ignore)

let fooStruct (struct(x, y): struct (list<float>*list<float>)) : unit =
    [1..n]
    |> List.iter (fun _ -> List.map2 (+) x y |> List.average |> ignore)


let fooMemo : list<float>*list<float>->unit =
    memoizeConcurrent foo

let fooStructMemo : struct (list<float>*list<float>)->unit =
    memoizeConcurrent fooStruct

time "foo" foo tup
time "fooMemo1" fooMemo tup
time "fooMemo2" fooMemo tup
printfn "\n"
time "fooStruct" fooStruct tupStruct
time "fooStructMemo1" fooStructMemo tupStruct
time "fooStructMemo2" fooStructMemo tupStruct

// ----------------- n = 1 ----------------

foo - Elapsed time: 16.947700 ms
fooMemo1 - Elapsed time: 27.495900 ms
fooMemo2 - Elapsed time: 5.575100 ms


fooStruct - Elapsed time: 1.942500 ms
fooStructMemo1 - Elapsed time: 2.866500 ms
fooStructMemo2 - Elapsed time: 5.978400 ms

// ----------------- n = 10 ----------------

foo - Elapsed time: 12.382000 ms
fooMemo1 - Elapsed time: 32.707500 ms
fooMemo2 - Elapsed time: 5.172900 ms


fooStruct - Elapsed time: 4.746200 ms
fooStructMemo1 - Elapsed time: 7.651900 ms
fooStructMemo2 - Elapsed time: 2.936400 ms


// ----------------- n = 100 ----------------
foo - Elapsed time: 44.160700 ms
fooMemo1 - Elapsed time: 48.957600 ms
fooMemo2 - Elapsed time: 5.695200 ms


fooStruct - Elapsed time: 20.628500 ms
fooStructMemo1 - Elapsed time: 25.449000 ms
fooStructMemo2 - Elapsed time: 3.104300 ms

// ----------------- n = 100000 ----------------

foo - Elapsed time: 20.120753 seconds
fooMemo1 - Elapsed time: 18.981092 seconds
fooMemo2 - Elapsed time: 5.362500 ms


fooStruct - Elapsed time: 18.776220 seconds
fooStructMemo1 - Elapsed time: 19.005812 seconds
fooStructMemo2 - Elapsed time: 2.761800 ms

A puzzle: It took 5 ms to calculate fooStructMemo2 when n = 1 and only 2 or 3 ms when n has other values. This was not a random effect. The differential is consistent.

Soldalma
  • 4,636
  • 3
  • 25
  • 38
  • 2
    Another possible explanation for the difference is that a dictionary needs to store its contents on the heap, which means that if you pass it a value tuple, it needs to be boxed and so a heap allocation occurs, losing the advantage of value tuples (being stored on the stack which is already in the CPU cache so is fast to access). I don't know .Net internals all that well and haven't tried to measure this myself, so if I'm wrong on this, I'm sure someone will correct me. – rmunn Jun 14 '18 at 02:19
  • 1
    I profiled your code, and while I didn't see an obvious reason why the struct version would be slower (in fact, it eliminates calls to `Tuple.ctor` and `Tuple.get_ItemX`), it does seem that using a `Lazy` for the memoized value is causing a large slowdown. Replacing the lazy with a lambda reduced the memozied runtime by a factor of 3 on my system, and it reduced the difference between the class tuple and the struct tuple from 0.3ms to 0.02ms for me. – Aaron M. Eshbach Jun 14 '18 at 13:26
  • @AaronM.Eshbach - I got the memoization function from . It looks like there are problems with concurrent memoization that are solved using lazy evaluation. For example or . – Soldalma Jun 14 '18 at 18:34
  • 1
    @Soldama Using a `ConcurrentDictionary` solves most of the concurrency issues itself. I think the only remaining issue is that the function may run more than once in some edge-cases, but the overhead is much lower than with lazy, so unless you need to ensure exactly-once execution of the memoized function, I would go with the lambda for performance. – Aaron M. Eshbach Jun 14 '18 at 18:42
  • 1
    I've shared my memoize function, if it helps: http://fssnip.net/7UT – Aaron M. Eshbach Jun 14 '18 at 19:02
  • @AaronM.Eshbach Thanks. Looks very useful. – Soldalma Jun 14 '18 at 22:46

0 Answers0