-1

I've implemented IAsyncEnumerable to my HttpClient requests where there is a pagination but I also need to GroupBy them. So I've implemented code like below;

public class Item 
{
   public int Id {get; set;}
   public string Name {get; set;}
}

public async IAsyncEnumerable<Item> GetItems()
{
   while(hasMorePage)
   {
       // ... get paginated items

       foreach(var item in paginatedItems)
       {
         yield return item;
       }
   }
}

// should find most repeated item(by Id) and count of them.
public async Task GroupItems()
{
  IAsyncEnumerable<Item> items = GetItems();
  
  //IAsyncGrouping
  await foreach(var item in items.GroupBy(i => i.Id).
                                  OrderByDescendingAwait(i => i.CountAsync()).
                                  Take(10))
 {
    Console.WriteLine(item.Key.ToString() + (await item.CountAsync()).ToString());
 }
}

This code works perfectly fine as I expected. But I would like to understand how GroupBy works here, because of it should have all items to group by id is there something that I miss? or is there anything I can refactor for performance?

sercanD
  • 187
  • 1
  • 13
  • GroupBy create a two dimensional array Your Take(10) is taking the first 10 keys. – jdweng Jul 27 '22 at 21:48
  • @jdweng alright, it's more clear now. But I also want to understand, does it work synchronously or asynchronously because I use yield return, await foreach takes item when it's ready. Is it adding items to the array that you mentioned asynchronously one by one and then grouping them when there is no item left? – sercanD Jul 27 '22 at 22:01
  • @TheodorZoulias added more details. – sercanD Jul 27 '22 at 22:01
  • The code is not blocking so other parts of the code do not have to wait for code to complete. – jdweng Jul 27 '22 at 22:12
  • What is the `var` in the `foreach(var item in paginatedItems)`? What is the `var` in the `var items = GetItems()`? What is the `var` in the `await foreach(var item in items.GroupBy(...`? – Theodor Zoulias Jul 27 '22 at 22:17
  • 2
    But `GroupBy` itself needs to pull all the results, so the `foreach` will only begin once all results are received. Also `await item.CountAsync().ToString()` should be `(await item.CountAsync()).ToString()` – Charlieface Jul 27 '22 at 23:04
  • *"This code works perfectly fine as I expected"* -- Sercan the problem for anyone who would like to answer this question, and for anyone else who might stumble on this question in the future trying to solve their own problem, is that we don't know what is the expected behavior of your program. I understand that your focus is to solve your problem, not to post the perfect question. Anyway, in case you have found the answer to your question, you could post it as a self-answer. Please consider coming back and improving the question in the future, when you have time available. – Theodor Zoulias Jul 27 '22 at 23:10
  • Code is not very important here I just wanted to share an example. The topic is how GroupBy works with IAsyncEnumerable so don't focus on code that much. I just wanted to understand its behavior of it. – sercanD Jul 28 '22 at 06:26
  • @sercanD what behavior to you want to understand? Whether it consumes the entire source before producing a result? It does. Whether it causes blocking while doing so? It doesn't. If you have a long running stream of events you'll have to wait until the stream ends to get any results. That's because `GroupBy` calculates the groupings [in its allocation phase](https://github.com/dotnet/reactive/blob/main/Ix.NET/Source/System.Linq.Async/System/Linq/Operators/GroupBy.cs#L278), then returns them in its iteration phase – Panagiotis Kanavos Jul 28 '22 at 08:57
  • @sercanD if you want to process streams of events you should look at Rx.NET, which was built by the same team that created System.Linq.Async. In Rx.NET [GroupBy](https://reactivex.io/documentation/operators/groupby.html) will emit a new group stream when a new key value is encountered. – Panagiotis Kanavos Jul 28 '22 at 09:00
  • 1
    @sercanD the ALinq repo you linked to has *nothing* to do with .NET's IAsyncEnumerable. It's an 8 year old repo. – Panagiotis Kanavos Jul 28 '22 at 09:09
  • @PanagiotisKanavos Thank you to make it more clear for me. You are right that link has nothing related to Async Linq. Let me delete that to not confuse. – sercanD Jul 28 '22 at 18:09

1 Answers1

1

First of all, the ALinq repo linked in the comments has nothing to do with .NET's IAsyncEnumerable or System.Linq.Async. It's an 8 year old repo that doesn't even target .NET Core. System.Linq.Async is maintained by the same team that built Reactive Excetions for .NET and its code is in the same Github repository

Second, it's unclear what behavior needs to be explained.

  • Does GroupBy block? No it doesn't.
  • Does GroupBy have to consume the entire source before producing a results? Yes it does.

If you have a long running stream of events you'll have to wait until the stream ends to get any results. That's because GroupBy calculates the groupings in its allocation phase, then returns them in its iteration phase

protected override async ValueTask<bool> MoveNextCore()
{
    switch (_state)
    {
        case AsyncIteratorState.Allocated:
            _lookup = await Internal.Lookup<TKey, TSource>.CreateAsync(_source, _keySelector, _comparer, _cancellationToken).ConfigureAwait(false);
            _enumerator = _lookup.ApplyResultSelector(_resultSelector).GetEnumerator();
            _state = AsyncIteratorState.Iterating;
            goto case AsyncIteratorState.Iterating;

        case AsyncIteratorState.Iterating:
            if (_enumerator!.MoveNext())
            {
                _current = _enumerator.Current;
                return true;
            }

            await DisposeAsync().ConfigureAwait(false);
            break;
    }

    return false;
}

If you want to process streams of events you should look at Rx.NET, which was built by the same team that created System.Linq.Async. In Rx.NET GroupBy will emit a new group stream when a new key value is encountered :

Reactive Extensions GroupBy illustration

Notice that Rx.NET's GroupBy actually partitions the event stream by the grouping key and emits streams not groupings. Subscribers will subscribe to those streams and process their events. This Aggregation example demonstrates this:

var source = Observable.Interval(TimeSpan.FromSeconds(0.1)).Take(10);
var group = source.GroupBy(i => i % 3);
group.Subscribe(
  grp => 
    grp.Min().Subscribe(
      minValue => 
        Console.WriteLine("{0} min value = {1}", grp.Key, minValue)),
  () => Console.WriteLine("Completed"));

If you need to process a long-running IAsyncEnumerable<> stream you can use ToObservable

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236