1

I am using .Net 6. My use case is to return stream data from "myapi" to a "middle api(BFF)" to client app in react.

I have a code in "myapi" endpoint that should yield a result as soon as it receives it -

myapi code -


public async IAsyncEnumerable<string> GetStreamingResponse()
        {            
            var rawAzureOpenAIRequest = new CompletionsRequest();
            rawAzureOpenAIRequest.ModelToUse = DefaultTextModelToUse;
            CompletionsOptions optns = new CompletionsOptions();
            optns.Prompts.Add("add 6+1 :");
            optns.Prompts.Add("below is the summary of technical consultant role in software");
var azResponse = await _openAIRepository.GetStreamingResponse(rawAzureOpenAIRequest.Model, optns,
                canToken);

            await foreach (var choice in azResponse.Value.GetChoicesStreaming())
            {
                await foreach (var message in choice.GetTextStreaming())
                {
                    yield return message;
                    await Task.Delay(10000);
                }
            }
}

My consuming "middle bff api" is as below, it is not hitting the breakpoint in consuming api after each yield return which is my issue, ie, control does not return to consuming api after each yield return. I want as soon as a message is yielded returned from the first api above, the consuming api should receive it.

Consuming api code -

[HttpGet]
[Route("v1/testendpoint")]
        public async Task Get()
        {            
            using HttpClient Client = new();
            using HttpResponseMessage response = await Client.GetAsync(
                "http://localhost...",
                HttpCompletionOption.ResponseHeadersRead
            ).ConfigureAwait(false);

            response.EnsureSuccessStatusCode();

           Stream responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);

            IAsyncEnumerable<object> messages = JsonSerializer.DeserializeAsyncEnumerable<object>(responseStream,
            new JsonSerializerOptions
            {
                PropertyNameCaseInsensitive = true,
                DefaultBufferSize = 10
            });

            Response.Headers.Add("Content-Type", "text/event-stream");

            await foreach (var message in messages)
            {
                debugger;
                byte[] messageBytes = ASCIIEncoding.ASCII.GetBytes("data:" + message + "\n\n");
                await Response.Body.WriteAsync(messageBytes, 0, messageBytes.Length);
                await Response.Body.FlushAsync();
            }
}

Could someone please explain why is it happening?

I have tried to add a delay to check if the control is returning to consuming api after yielding a return, but it is not.

I also tried hitting the first api that yields with below client-side code and it yields in batches.

fetch("http://localhost:7200/v1...", config)
      .then(async response => {
        const reader = response.body?.getReader();
        if (!reader) {
          return;
        }
        const decoder = new TextDecoder();
        while (true) {
          const { done, value } = await reader.read();
          if (done) break;
          var item = decoder.decode(value).replace(/\[|]/g, '').replace(/^,/, '');

          var parsedItem = JSON.parse(item);
          console.log(item + "\n");
          debugger;

        }
        reader.releaseLock();
      }, (reason) => {
        console.log(reason);
        debugger;
      });

In the first sending api, the GetTextStreaming method has the following definition - enter image description here

UPDATE:

Trying to return stream directly now - myapi code

public async Task<Stream> GetRawStreamingCompletionResponse()
            {            
                var rawAzureOpenAIRequest = new CompletionsRequest();
                rawAzureOpenAIRequest.ModelToUse = DefaultTextModelToUse;
                CompletionsOptions optns = new CompletionsOptions();
                optns.Prompts.Add("add 6+1 :");
                optns.Prompts.Add("below is the summary of technical consultant role in software");
    
                var azResponse = await _openAIRepository
                    .GetStreamingResponse(rawAzureOpenAIRequest.ModelToUse, optns,
                    canToken);
    
                return azResponse.GetRawResponse().ContentStream;
            }

In consuming api -

public async Task Get() {
                var stream = await Client.GetStreamAsync("http://localhost...");
                Response.Headers.Add("Content-Type", "text/event-stream");
                stream.CopyToAsync(this.Response.Body);
                await Response.Body.FlushAsync();            
}
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
s_v
  • 132
  • 1
  • 14
  • Which version of .net? Can you post more of your api definition so we can see how the response is generated? Have you tested the api with any other http client to check if the result is streamed? – Jeremy Lakeman May 17 '23 at 04:48
  • Added the info. Reg testing with other httpclient, are you refering to test in the `GetStreamingCompletionResponse()` method in first api? – s_v May 17 '23 at 05:26
  • 1
    In theory the MVC json serialiser should "flush every time IAsyncEnumerator.MoveNextAsync() returns an incomplete task" https://github.com/dotnet/aspnetcore/issues/32483#issuecomment-849508926 But have you tested this at a http protocol level, rather than through your 2nd layer of C# code. – Jeremy Lakeman May 17 '23 at 05:34
  • I tested the first api that yields with a client side code, and it returns in batches like in this post https://stackoverflow.com/questions/76062528/consuming-an-iasyncenumerable-that-makes-an-async-call-to-another-service-or-api/76233750#76233750. But it does return before all yields are completed and then goes back to yield more results. – s_v May 17 '23 at 06:38
  • @JeremyLakeman I have tested with a javascript client, it is getting atleast 2 streams of data. I have updated the client code I used for testing in the question. I think I might not be reading the stream correctly in the consuming api. – s_v May 17 '23 at 09:08
  • 1
    Just because `GetStreamingCompletionResponse` returns an `IAsyncEnumerable` doesn't mean there's a streaming response. There's no such thing in HTTP, the protocol itself. Using `IAsyncEnumerable` allows writing each result *snippet* to the response as soon as it's available instead of waiting for the entire list of results to be available, without blocking. If you wanted to return a single JSON object per line you'd need extra configuration and the *client* would have to read the response stream as a *stream*, trying to detect the newlines before parsing – Panagiotis Kanavos May 17 '23 at 09:23
  • @PanagiotisKanavos Thank you, Could you pls suggest what could be the extra configuration for "one line or one json" at a time, i am trying to use streaming feature of this openai api, to improve user experience. My use case is to send from "1st api using openai" to an "middle api (BFF)" to a react client side application. I was trying to follow this post - https://www.tpeczek.com/2021/07/aspnet-core-6-and-iasyncenumerable.html – s_v May 17 '23 at 09:50

1 Answers1

1

I think I found some reasoning behind it. It depends on when the System.Text.Json library flushes the reponse body when serializing an IAsyncEnumerable. It does not flush the response body after every step of async enumeration, rather it flushes at a buffer size limit that's internal to the JSON serializer.

On their page the mentioned default buffer size, in bytes, is 16,384.

I handled it by Flushing Response after every yield.

s_v
  • 132
  • 1
  • 14