2

I'm using F# and have an AsyncSeq<Async<'t>>. Each item will take a varying amount of time to process and does I/O that's rate-limited.

I want to run all the operations in parallel and then pass them down the chain as an AsyncSeq<'t> so I can perform further manipulations on them and ultimately AsyncSeq.fold them into a final outcome.

The following AsyncSeq operations almost meet my needs:

  • mapAsyncParallel - does the parallelism, but it's unconstrained, (and I don't need the order preserved)
  • iterAsyncParallelThrottled - parallel and has a max degree of parallelism but doesn't let me return results (and I don't need the order preserved)

What I really need is like a mapAsyncParallelThrottled. But, to be more precise, really the operation would be entitled mapAsyncParallelThrottledUnordered.

Things I'm considering:

  1. use mapAsyncParallel but use a Semaphore within the function to constrain the parallelism myself, which is probably not going to be optimal in terms of concurrency, and due to buffering the results to reorder them.
  2. use iterAsyncParallelThrottled and do some ugly folding of the results into an accumulator as they arrive guarded by a lock kinda like this - but I don't need the ordering so it won't be optimal.
  3. build what I need by enumerating the source and emitting results via AsyncSeqSrc like this. I'd probably have a set of Async.StartAsTask tasks in flight and start more after each Task.WaitAny gives me something to AsyncSeqSrc.put until I reach the maxDegreeOfParallelism

Surely I'm missing a simple answer and there's a better way?

Failing that, would love someone to sanity check my option 3 in either direction!

I'm open to using AsyncSeq.toAsyncEnum and then use an IAsyncEnumerable way of achieving the same outcome if that exists, though ideally without getting into TPL DataFlow or RX land if it can be avoided (I've done extensive SO searching for that without results...).

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249
  • `AsyncSeq` is inherently ordered, so perhaps it's not the best tool for this job. Instead, can you put the operations in a plain `seq>` and call [`Async.Parallel`](https://fsharp.github.io/fsharp-core-docs/reference/fsharp-control-fsharpasync.html#Parallel) with your desired degree of parallelism? – Brian Berns Feb 08 '22 at 20:52
  • @BrianBerns Thanks for responding, however `Async.Parallel` (and `Sequential`) also buffer results to maintain order. The bigger problem with dropping from `AsyncSeq` to `Seq` would be the fact that the lazy (no CPU when waiting) nature of the pipeline would be lost - if I am doing 100k async calls, I would like to be able to work through those with constrained parallelism, but be folding the results as they arrive. – Ruben Bartelink Feb 08 '22 at 22:22
  • Hmm. I'm not an async expert, but I don't think that `AsyncSeq.fold` works like that. If the first call in the sequence takes 30 minutes to complete, `AsyncSeq.fold` is going to wait that long before it starts accumulating results, because it's inherently sequential. It can't fold the results out of order. – Brian Berns Feb 08 '22 at 23:26
  • Not suggesting that the `AsyncSeq.fold` would work like that - it operates like any other fold in that it will process items in order (but if nothing comes out of the pipe for 2 minutes between the first and second items, it uses zero CPU / threadpool resources). I'm asking about a specific kind of operator that takes an AsyncSeq as input and then efficiently transforms those items, preserving the quantity of items, maximizing the throughput, but potentially reordering items. I am specifically stating in the OP that I want to break the normal rule of order preservation in this `map` op. – Ruben Bartelink Feb 09 '22 at 00:43

2 Answers2

1

If I'm understanding your requirements then something like this will work. It effectively combines the iter unordered with a channel to allow a mapping instead.

let mapAsyncParallelBoundedUnordered boundedAmount (mapper: 't -> Async<_>) source = asyncSeq {
    let! ct = Async.CancellationToken
    let channel = Channel.CreateUnbounded()

    let! _ = 
        async {
            do!
                source
                |> AsyncSeq.iterAsyncParallelThrottled boundedAmount (fun s -> async {
                    let! orderChild = mapper s
                    do! channel.Writer.WriteAsync(orderChild, ct)
                })

            channel.Writer.Complete()
        } 
        |> Async.StartChild

    for item in channel.Reader.ReadAllAsync(ct) |> AsyncSeq.ofAsyncEnum do
        let! toReturn = item
        yield toReturn
}

Also with a little bit of variation of the above (e.g. child tasks) you can make it ordered and parallelism bounded.

let mapAsyncParallelBounded boundedAmount mapper source = asyncSeq {
    let! ct = Async.CancellationToken
    let channel = Channel.CreateBounded(BoundedChannelOptions(boundedAmount))

    let! _ =
        source
        |> AsyncSeq.iterAsync (fun s -> async {
            let! orderChild = mapper s |> Async.StartChild
            do! channel.Writer.WriteAsync(orderChild, ct)
        })
        |> Async.StartChild

    let! ct = Async.CancellationToken
    for item in channel.Reader.ReadAllAsync(ct) |> AsyncSeq.ofAsyncEnum do
        let! toReturn = item
        yield toReturn
}
akara
  • 396
  • 2
  • 7
  • Thanks - these both work well. it seems that the unordered is not a big win in my case so I'll probably stick with the `mapAsyncParallelBounded` variant for general usage – Ruben Bartelink Feb 10 '22 at 12:08
0

Here's a testbed I used to validate @akara's excellent work:

#r "nuget:FSharp.Control.AsyncSeq"
open FSharp.Control
module AsyncSeqEx =

    open System.Threading.Channels

    let mapAsyncParallelBoundedUnordered boundedAmount (mapper: 't -> Async<'u>) source = asyncSeq {
        let! ct = Async.CancellationToken
        let channel : Channel<'u> = Channel.CreateUnbounded()
        let handle req = async {
            let! res = mapper req
            do! let t = channel.Writer.WriteAsync(res, ct) in t.AsTask() |> Async.AwaitTask }
        let! _ = Async.StartChild <| async {
            do! source |> AsyncSeq.iterAsyncParallelThrottled boundedAmount handle
            channel.Writer.Complete() }
        yield! channel.Reader.ReadAllAsync(ct) |> AsyncSeq.ofAsyncEnum
    }

I also ported the same code to use AsyncSeqSrc instead of channels, which seems to work too, with equivalent perf:

    // AsyncSeqSrc-based reimpl of the above
    let mapAsyncParallelBoundedUnordered2 boundedAmount (mapper: 't -> Async<'u>) source = asyncSeq {
        let output = AsyncSeqSrc.create ()
        let handle req = async { let! res = mapper req in AsyncSeqSrc.put res output }
        let! _ = Async.StartChild <| async {
            do! source |> AsyncSeq.iterAsyncParallelThrottled boundedAmount handle
            AsyncSeqSrc.close output }
        yield! AsyncSeqSrc.toAsyncSeq output
    }

The following impl, leaning on AsyncSeq.mapAsyncParallel seems to achieve similar perf to both:

module Async =

    let parallelThrottled dop f = Async.Parallel(f, maxDegreeOfParallelism = dop)
    type Semaphore(max) =
        let inner = new System.Threading.SemaphoreSlim(max)
        member _.Await() = async {
            let! ct = Async.CancellationToken
            return! inner.WaitAsync ct |> Async.AwaitTask }
        member _.Release() =
            inner.Release() |> ignore
    let throttle degreeOfParallelism f =
        let s = Semaphore degreeOfParallelism
        fun x -> async {
            do! s.Await()
            try return! f x
            finally s.Release() }

module AsyncSeq =

    open FSharp.Control

    // see https://stackoverflow.com/a/71065152/11635
    let mapAsyncParallelThrottled degreeOfParallelism (f: 't -> Async<'u>) : AsyncSeq<'t> -> AsyncSeq<'u> =
        let throttle = Async.throttle degreeOfParallelism
        AsyncSeq.mapAsyncParallel (throttle f)

Testbed:

let dop = 10
let r = System.Random()
let durations = Array.init 10000 (fun _ -> r.Next(10, 100))
let work =
    let sleep (x : int) = async {
        do! Async.Sleep x
        return x
    }
    AsyncSeq.ofSeq durations |> AsyncSeq.mapAsyncParallelThrottled dop sleep
let start = System.Diagnostics.Stopwatch.StartNew()
let results = work |> AsyncSeq.toArrayAsync |> Async.RunSynchronously
let timeTaken = start.ElapsedMilliseconds
let totalTimeTaken = Array.sum results
let expectedWallTime = float totalTimeTaken / float dop
let overhead = timeTaken - int64 expectedWallTime
let inline stringf format (x : ^a) =
    (^a : (member ToString : string -> string) (x, format))
let inline sep x = stringf "N0" x
printfn $"Gross {sep totalTimeTaken}ms Threads {dop} Wall {sep timeTaken}ms overhead {sep overhead}ms ordered: {durations = results}"

Result:

Gross 544,873ms Threads 10 Wall 55,659ms overhead 1,172ms ordered: True

For now, it seems that for my use case there's no major win to be had by admitting unordered results vs just having the function argument to mapAsyncParallel self-govern to achieve the desired throttling effect

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249