0

Assume I have a method that performs some I/O operation that asynchronously returns data as some type implementing IAsyncEnumerable<T>

example:

class MyDataStream: IAsyncEnumerable<int> {
 //Code omitted for brevity
}

class Bla {
  MyDataStream GetData() {
    //Code omitted for brevity
  }
}

I want to keep it as an IAsyncEnumerable for as long "upstream" as possible, passing it through a variety of "middleware" eventually streaming the result down an HTTP API as json elements.

example

var data = new Bla().GetData();
IAsyncEnumerable<string> stringData = ToStringMiddleware(data); //performs await foreach
//..other "middleware"
return stringData;

Now in some cases, some middleware may decide to iterate over the whole stream (i.e. to perform some aggregation-function on the data.

var data = new Bla().GetData();
double sum = await Sum(data); //-> this will iterate over the whole stream.
//do something with Sum (i.e. log, or whatever)
//..other "middleware"
return data;

But other middleware down the pipeline doesn't know that. That other middleware then may chose to do a similar thing.

var data = new Bla().GetData();
double sum = await Sum(data); //-> this will iterate over the whole stream.
double average = await Average(data); //-> another iteration..
//..other "middleware"
return data;

I may end up with multiple iterations over the stream (each performing the underlying I/O operation). I don't like that.

I can implement the IAsyncEnumerable interface in such a way that it only evaluates once, so the first await foreach will basically keep the data in a private collection for the next iterations. Easy enough. No, if the data is only evaluated at the end by the serializer that writes it to the HTTP response, we've only accessed the data once, and if any middleware iterates over the stream, we also access the data just once.

The big problem I have now is: What do I call that implementation? It's not really a cache, I think. Nor is it a buffer.

I know, this is not so much a technical question as it is a naming question. But I'd like to have my code as understandable as possible and I don't want to name it "MyDataStreamThatOnlyGetsEvaluatedOnce" because it feels stupid and nobody in my team has come up with a better name yet :D

So any ideas or input would be appreciated

Thank you


EDIT: I can understand why this question has been closed as opinion-based. Nevertheless I would like to thank the contributors. The discussion in the comments as well as the provided answers have indeed helped me solve my "problem". THANKS!

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Ar Es
  • 389
  • 3
  • 12
  • You are using an HTTP Client where the connection closes after each request/response. You may want to switch to a WebSocket where the connection stays open after each request/response. – jdweng Aug 08 '23 at 09:05
  • 2
    I'd call it a buffer. It's buffering the response, so that it can be read multiple times without hitting the network again – canton7 Aug 08 '23 at 09:06
  • Actually I'd say that it's caching the data rather than buffering it. Buffering implies reading data using a smaller block size than the overall size of the data (to me). – Matthew Watson Aug 08 '23 at 09:10
  • You might be better asking this on the [Software Engineering](https://softwareengineering.stackexchange.com/help/on-topic) site. – Matthew Watson Aug 08 '23 at 09:18
  • 3
    Maybe [`Memoize`](https://github.com/dotnet/reactive/blob/main/Ix.NET/Source/System.Interactive/System/Linq/Operators/Memoize.cs)? – Theodor Zoulias Aug 08 '23 at 09:22
  • 2
    Given that [`Memoization`](https://en.wikipedia.org/wiki/Memoization) is exactly what the OP is doing, this would seem to be a good idea! – Matthew Watson Aug 08 '23 at 09:23
  • @MatthewWatson there is no `Memoize` operator for asynchronous sequences though (at least not in the [System.Linq.Async](https://github.com/dotnet/reactive/tree/main/Ix.NET/Source/System.Linq.Async/System/Linq/Operators) or [System.Interactive.Async](https://github.com/dotnet/reactive/tree/main/Ix.NET/Source/System.Interactive.Async/System/Linq/Operators) packages). – Theodor Zoulias Aug 08 '23 at 09:27
  • This question might be relevant: [How to check an IEnumerable for multiple conditions with a single enumeration without buffering?](https://stackoverflow.com/questions/58578480/how-to-check-an-ienumerable-for-multiple-conditions-with-a-single-enumeration-wi) – Theodor Zoulias Aug 08 '23 at 09:35

2 Answers2

1

As it is with the regular IEnumerable, any IAsyncEnumerable doesn't have to be implemented in a way that it may be run multiple times.

It's just that, enumerators aren't guaranteed to be enumerated twice. Same concept applies to Streams, they also do not guarantee going back and reading something again. Granted, they have CanSeek property to specify that, but IEnumerable doesn't even have that.

So, indeed the question is simply a naming one.

I would suggest NOT appending any special suffix to the name at all, simply because it is in the nature of all enumerators to guarantee only the first run.

However, you may want to specify that behaviour in the XML comment:

/// <remarks>
/// The returned <see cref="IAsyncEnumerable"/> can only be enumerated once.
/// </remarks>
MyDataStream GetData() {

Update

If your implementation is buffering the sequence, then you're simply hiding that fact for no particular reason.

If your implementation would always return some cached/buffered in the local memory list, then you should simply return IReadOnlyList<int>/IReadOnlyCollection<int> instead of the enumerator.

It would also make it more simple for the callers to use.

Another thought

I can think of a weird case when you would sometimes NOT have any usage for the returned sequence, but sometimes you would indeed need to to buffer the data.

In this particular case the best way to handle the situation is to acknowledge that choosing whether the user wants the result in its full or not and whether they want the result to be cached should be made on that user's side, and if they truly need that, it should be their job to implement the buffering method to their liking.

Would it be an extension interface like IBufferingAsyncEnumerable<> or something else.

AgentFire
  • 8,944
  • 8
  • 43
  • 90
0

If I understand the question correctly I would call it a cache, since the goal is to avoid multiple IO reads. I think that fits quite well to the common understanding of what a cache is. But I would not object to using "Buffered". I would suggest asking your team instead, since it is theire understanding that actually matters. Whatever naming you pick, make sure the behavior is well described in the comments.

Note that you should assume that multiple iterations of any IEnumerable may redo some expensive operation, and may not return the same result. You may get a warning about this in your IDE. You should normally do something like .ToList() to avoid this. So one approach could be to agree in your team to not do multiple iterations of any IAsyncEnumerable, and throw an exception if this is done.

JonasH
  • 28,608
  • 2
  • 10
  • 23