3

I am working on retrieving about 10 different types of data using http-requests with rather complex dependencies between them. And I am trying to find my way through this in an elegant, readable and maintainable way without waiting unnecessarily.

Let's assume, some method creates a Dictionary<string, Task<int>>. What is the most elegant way to convert this into Task<Dictionary<string, int>>?

The new outer Task should finish, as soon as all Tasks contained in the dictionary are finished.

Of course, I can write this manually:

Dictionary<string, Task<int>> values = GetValues();
Task<Dictionary<string, int>> result = Task.Run(async () => {
    Dictionary<string, int> rewrapped = new();
    foreach (var entry in values) {
        rewrapped.Add(entry.Key, await entry.Value);
    }
    return rewrapped;
});

But isn't there a better way?

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Andreas
  • 1,997
  • 21
  • 35
  • Sounds like your look for this var results = await Task.WhenAll(tasks); "Creates a task that will complete when all of the Task objects in an enumerable collection have completed." – Frallan Sep 14 '22 at 06:57
  • What you ask is ... unusual. A Task isn't a value, it's a promise that something will complete in the future. Why is `Dictionary>` used in the first place? To get what you want you'll have to *await* all the tasks, retrieve their results, then return all of them in a `Dictionary`. You don't need `Task.Run` to do that, but the entire operation is too complicated – Panagiotis Kanavos Sep 14 '22 at 06:57
  • 1
    @Frallan that only works on `IEnumerable>` – Panagiotis Kanavos Sep 14 '22 at 06:57
  • @Andreas why do you want this? The correct answer is *don't do it this way*. It's relatively easy to post code that produces what you want but it's almost certainly the wrong thing to do. – Panagiotis Kanavos Sep 14 '22 at 06:58
  • 1
    Why construct a dictionary of tasks? Complete the tasks then make the dictionary. – Jodrell Sep 14 '22 at 07:04
  • 1
    @PanagiotisKanavos I have a tendency to agree with you here, because it seems like an exercise. But there are scenarios where you wouldn't want to run all tasks to completion and that a `Dictionary>` is agreeable and provided. Like, when you need to wrap and handle for cancellation -- foregoing the running of `Task` but it's an expensive network/IO call. So, let's not get too caught up in trying to comprehend the merit of doing this. – Brett Caswell Sep 14 '22 at 07:12
  • @BrettCaswell the tasks run whether you await them or not. The question's request makes cancelling *harder* because everything has to be awaited. There are ways to collect results from multiple concurrent requests, eg using a ConcurrentDictionary – Panagiotis Kanavos Sep 14 '22 at 07:13
  • @Andreas I posted an example that shows how you can change `GetValues` to both work asynchronously and return the actual values. There are even better ways to do this. You'll have to explain what the actual problem is. – Panagiotis Kanavos Sep 14 '22 at 07:24
  • @Andreas now that the real problem is clear, it's easy to find solutions. There are several ways in the framework itself to handle interconnected operations, by creating a pipeline of steps. I added an example that shows how to create a pipeline that downloads 8 files at a time, parses 2 at a time and inserts them into a database – Panagiotis Kanavos Sep 14 '22 at 08:03

2 Answers2

5

Rule of thumb: Never use Task.Run instead of it being truly async, it will use a thread to mimic asyncronity!

You just have to await the values, then get the Results:

Dictionary<string, Task<int>> values = GetValues();
await Task.WhenAll(values.Values);
Dictionary<string, int> results = values.ToDictionary(p => p.Key, p => p.Value.Result);

You can wrap the last two lines in a method, if you really want a Task<Dictionary<string, int>>.

async Task<Dictionary<string, int>> Unwrap(Dictionary<string, Task<int>> values)
{
    await Task.WhenAll(values.Values);
    return values.ToDictionary(p => p.Key, p => p.Value.Result);
}
wertzui
  • 5,148
  • 3
  • 31
  • 51
  • Thank you for the comment to Task.Run. I did not know that. In your first example, the code will block until all tasks are finished. This is not what I want. The second one is more like it. – Andreas Sep 14 '22 at 07:05
  • 1
    @Andreas the two snippets are *identical*. No, the first won't block, it uses `await` just like the second. Why do you want this in the first place? – Panagiotis Kanavos Sep 14 '22 at 07:06
  • 1
    The code won't block in either of the examples. the `await` keyword will essentially turn the method into a state machine that will continue once it receives an interrupt which normally comes through levels deep down in your system and originates from your hardware (disc, network, ...). I suggest this blog post to read up on async functionality in C#: https://blog.stephencleary.com/2013/11/there-is-no-thread.html – wertzui Sep 14 '22 at 07:08
  • Sorry, I wrote too little to explain. The first code will not do anything else in that method while awaiting `Task.WhenAll`. But your `Unwrap` will return a task that can be awaited later. Even though I don't really like the `WhenAll` and `.Result` from a point of elegance in there. – Andreas Sep 14 '22 at 07:13
  • @Andreas you misunderstood what the code does then. Everyone is asking you why do you want this? It doesn't help and probably doesn't do what you assume. – Panagiotis Kanavos Sep 14 '22 at 07:15
  • Good answer. The `Unwrap` method is missing a `.ConfigureAwait(false)` though. The `Unwrap` could have generic `TKey`, `TValue` parameters. Also it should be noted for completeness that in case multiple tasks fail, only one of the exceptions will be propagated. – Theodor Zoulias Sep 14 '22 at 07:31
  • It won't help you, but after your effort invested, I want to at least hint at the reason I am asking this: I am working on retrieving about 10 different types of data using http-requests with rather complex dependencies between them. And I am trying to find my way through this in an elegant, readable and maintainable way without waiting unnecessarily. And one issue I stumbled across was unwrapping Lists and Dictionaries containing Tasks. And since I hit this issue before, I expected that there is a framework-solution for this. – Andreas Sep 14 '22 at 07:39
  • @Andreas put that in the question itself. Yes, it matters *a lot*. I show in my answer how to solve the actual problem in a better way already. `there is a framework-solution for this.` yes, just not in the way you expected. You can use Channels, Dataflow blocks, IAsyncEnumerable to construct complex processing pipelines – Panagiotis Kanavos Sep 14 '22 at 07:48
  • 2
    @TheodorZoulias `.ConfigureAwait(false)` is not missing here. We do not know if OP maybe needs `.ConfigureAwait(true)` (which is the default), or if he is running inside .Net Core or .Net > 5 where there is no continuation context so `.ConfigureAwait(false)` won't have any impact. – wertzui Sep 14 '22 at 09:40
  • Library-like methods should not be context-aware, and the `Unwrap` is such a method. Omitting the `.ConfigureAwait(false)` just opens the possibility for a deadlock (in case the `Unwrap` is used in a single-thread context in a sync-over-async style) for no reason (other than readability). – Theodor Zoulias Sep 14 '22 at 09:51
  • To give an example, if you do `var dictionary = tasks.Unwrap().Result;` in an event handler of a WinForms application, the application will deadlock. If you add the `.ConfigureAwait(false)`, the application won't deadlock. – Theodor Zoulias Sep 15 '22 at 10:19
1

The request is unusual. A Task isn't a value, it's a promise that something will complete in the future. To get the desired result the code will have to await all the tasks, retrieve their results, then return all of them in a Dictionary<string, int>.

There's almost certainly a better way to solve the actual problem.

One quick and dirty example would be :

async Task<ConcurrentDictionary<string,T>> GetValues<T>(CancellationToken token=default)
{
    var dict=new ConcurrentDictionary<string,T>();
    try
    {
        await Parallel.ForEachAsync(_urls,token, async (url,tk)=>{
            var res=await _httpClient.GetStringAsync(url,tk);
            dict[url]=someResult;
        });
    }
    catch(OperationCancelledException){}
    return dict;    
}

There are far better ways to solve the actual problem though - execute interdependent HttpClient requests. .NET offers several ways to construct asynchronous processing pipelines: Dataflow blocks, Channels, IAsyncEnumerable.

Dataflow Blocks

For example, using Dataflow blocks you can create a pipeline that downloads CSV files, parses them, then inserts the data into a database.

These options specify that 8 CSV files will be downloaded concurrently and two parsed concurrently.

var downloadDOP=8;
var parseDOP=2;
var tableName="SomeTable";

var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};

var downloadOptions =new ExecutionDataflowBlockOptions {
    MaxDegreeOfParallelism = downloadDOP,
};

var parseOptions =new ExecutionDataflowBlockOptions {
    MaxDegreeOfParallelism = parseDOP,
};


The following code creates the pipeline

HttpClient http=new HttpClient(...);


var downloader=new TransformBlock<(Uri,string),FileInfo>(async (uri,path)=>{
    var file=new FileInfo(path);
    using var stream =await httpClient.GetStreamAsync(uri);
    using var fileStream=file.Create();
    await stream.CopyToAsync(stream);
    return file;
},downloadOptions);

var parser=new TransformBlock<FileInfo,Foo[]>(async file=>{
    using var reader = file.OpenText();
    using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
    var records = csv.GetRecords<Foo>().ToList();
    return records;
},parseOptions);

var importer=new ActionBlock<Foo[]>(async recs=>{
    using var bcp=new SqlBulkCopy(connectionString, SqlBulkCopyOptions.TableLock);
    bcp.DestinationTableName=tableName;

    //Map columns if needed
    ...
    using var reader=ObjectReader.Create(recs);
    await bcp.WriteToServerAsync(reader);
});

downloader.LinkTo(parser,linkOptions);
parser.LinkTo(importer,linkOptions);

Once you have the pipeline, you can start posting URLs to it and await for the entire pipeline to complete:

IEnumerable<(Uri,string)> filesToDownload = ...


foreach(var pair in filesToDownload)
{
    await downloader.SendAsync(pair);
}

downloader.Complete();

await importer.Completion;
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Removed it. It's not worth fixing (with eg Zip) now that the OP actually explained what the real problem is - yet another pipeline .... – Panagiotis Kanavos Sep 14 '22 at 07:51
  • If the OP wants help for their real problem, they can post a new question about that. This question is about the narrow problem of converting a dictionary of tasks to a dictionary of results. – Theodor Zoulias Sep 14 '22 at 07:54
  • Thank you for the update. I played around with dataflow-blocks a little and will definitely revisit them next time. They seem to be able to solve some issues in my use-case, but not all. And they seem to be a lot more intrusive to the code than chaining simple tasks together. There is so much more potential in async/await. But for some reason, Linq lacks the necessary overloads to make handling Enumerables of tasks really nice. – Andreas Sep 14 '22 at 16:18