3

I have an IAsyncEnumerable that returns what is essentially a sequence of Key/IEnumerable<Value> pairs. I have code consuming this and other similar enumerables, that assumes it will be receiving a unique collection of keys. But one of my data sources does not obey this constraint. It does, however, keep duplicate keys grouped together. (You won't see [k1, k2, k1].)

This should be fairly simple to resolve with a wrapper that partitions the data by key and concatenates the values, except that I don't see any usable partitioning operator in System.Linq.Async. There are GroupBy and ToLookup, but both of these are eager operators that will consume the entire sequence immediately. This is not suitable for my purposes, due to large amounts of data being involved.

Is there any simple way to partition an IAsyncEnumerable similar to GroupBy, grouping inputs according to a key selector, but keeping its behavior fully lazy and generating new groupings on demand when the key changes?

EDIT: I looked to see if MoreLINQ has anything like this, and found GroupAdjacent, but the code shows that, while it does not eagerly consume the entire input sequence, it will still eagerly consume the entire group when starting a new group. I'm looking for a method that will return a lazy enumerable in its groupings. It's trickier than it sounds!

Mason Wheeler
  • 82,511
  • 50
  • 270
  • 477

1 Answers1

1

Here is a GroupAdjacent operator for asynchronous sequences, similar to the synonymous operator of the MoreLinq package, with the difference that it doesn't buffer the elements of the emitted groupings. The groupings are expected to be enumerated fully, in the correct order, one grouping at a time, otherwise an InvalidOperationException will be thrown.

This implementation requires the package System.Linq.Async, because it emits groupings that implement the IAsyncGrouping<out TKey, out TElement> interface.

/// <summary>
/// Groups the adjacent elements of a sequence according to a specified
/// key selector function.
/// </summary>
/// <remarks>
/// The groups don't contain buffered elements.
/// Enumerating the groups in the correct order is mandatory.
/// </remarks>
public static IAsyncEnumerable<IAsyncGrouping<TKey, TSource>>
    GroupAdjacent<TSource, TKey>(
        this IAsyncEnumerable<TSource> source,
        Func<TSource, TKey> keySelector,
        IEqualityComparer<TKey> keyComparer = null)
{
    ArgumentNullException.ThrowIfNull(source);
    ArgumentNullException.ThrowIfNull(keySelector);
    keyComparer ??= EqualityComparer<TKey>.Default;
    return Implementation();

    async IAsyncEnumerable<IAsyncGrouping<TKey, TSource>> Implementation(
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        Tuple<TSource, TKey, bool> sharedState = null;
        var enumerator = source.GetAsyncEnumerator(cancellationToken);
        try
        {
            if (!await enumerator.MoveNextAsync().ConfigureAwait(false))
                yield break;
            var firstItem = enumerator.Current;
            var firstKey = keySelector(firstItem);
            sharedState = new(firstItem, firstKey, true);

            Tuple<TSource, TKey, bool> previousState = null;
            while (true)
            {
                var state = Volatile.Read(ref sharedState);
                if (ReferenceEquals(state, previousState))
                    throw new InvalidOperationException("Out of order enumeration.");
                var (item, key, exists) = state;
                if (!exists) yield break;
                previousState = state;
                yield return new AsyncGrouping<TKey, TSource>(key, GetAdjacent(state));
            }
        }
        finally { await enumerator.DisposeAsync().ConfigureAwait(false); }

        async IAsyncEnumerable<TSource> GetAdjacent(Tuple<TSource, TKey, bool> state)
        {
            if (!ReferenceEquals(Volatile.Read(ref sharedState), state))
                throw new InvalidOperationException("Out of order enumeration.");
            var (stateItem, stateKey, stateExists) = state;
            Debug.Assert(stateExists);
            yield return stateItem;
            Tuple<TSource, TKey, bool> nextState;
            while (true)
            {
                if (!ReferenceEquals(Volatile.Read(ref sharedState), state))
                    throw new InvalidOperationException("Out of order enumeration.");
                if (!await enumerator.MoveNextAsync().ConfigureAwait(false))
                {
                    nextState = new(default, default, false);
                    break;
                }
                var item = enumerator.Current;
                var key = keySelector(item);
                if (!keyComparer.Equals(key, stateKey))
                {
                    nextState = new(item, key, true);
                    break;
                }
                yield return item;
            }
            if (!ReferenceEquals(Interlocked.CompareExchange(
                ref sharedState, nextState, state), state))
                throw new InvalidOperationException("Out of order enumeration.");
        }
    }
}

private class AsyncGrouping<TKey, TElement> : IAsyncGrouping<TKey, TElement>
{
    private readonly TKey _key;
    private readonly IAsyncEnumerable<TElement> _sequence;

    public AsyncGrouping(TKey key, IAsyncEnumerable<TElement> sequence)
    {
        _key = key;
        _sequence = sequence;
    }

    public TKey Key => _key;

    public IAsyncEnumerator<TElement> GetAsyncEnumerator(
        CancellationToken cancellationToken = default)
    {
        return _sequence.GetAsyncEnumerator(cancellationToken);
    }
}

Usage example:

IAsyncEnumerable<IGrouping<string, double>> source = //...

IAsyncEnumerable<IAsyncGrouping<string, double>> merged = source
    .GroupAdjacent(g => g.Key)
    .Select(gg => new AsyncGrouping<string, double>(
        gg.Key, gg.Select(g => g.ToAsyncEnumerable()).Concat()));

This example starts with a sequence that contains groupings, and the goal is to combine any adjacent groupings that have the same key to a single asynchronous grouping that contains all of their elements. After applying the GroupAdjacent(g => g.Key) operator we get this type:

IAsyncEnumerable<IAsyncGrouping<string, IGrouping<string, double>>>

So in this phase each asynchronous grouping contains inner groupings, not individual elements. We need to Concat this nested structure in order to get what we want. The Concat operator exists in the System.Interactive.Async package, and it has this signature:

public static IAsyncEnumerable<TSource> Concat<TSource>(
    this IAsyncEnumerable<IAsyncEnumerable<TSource>> sources);

The ToAsyncEnumerable operator (System.Linq.Async) is attached to the synchronous inner groupings, in order to satisfy this signature.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104