3

We have a problem where we need a request/response pattern while using the TPL Dataflow library. Our problem is we have a .NET core API that calls a dependent service. The dependent service limits concurrent requests. Our API does not limit concurrent requests; therefore, we could receive thousands of requests at a time. In this case, the dependent service would reject requests after reaching its limit. Therefore, we implemented a BufferBlock<T> and a TransformBlock<TIn, TOut>. The performance is solid and works great. We tested our API front end with 1000 users issuing 100 requests/sec with 0 problems. The buffer block buffers requests and the transform block executes in parallel our desired amount of requests. The dependency service receives our requests and responds. We return that response in the transform block action and all is well. Our problem is the buffer block and transform block are disconnected which means requests/responses are not in sync. We are experiencing an issue where a request will receive the response of another requester (please see the code below).

Specific to the code below, our problem lies in the GetContent method. That method is called from a service layer in our API which ultimately is called from our controller. The code below and the service layer are singletons. The SendAsync to the buffer is disconnected from the transform block ReceiveAsync so that arbitrary responses are returned and not necessarily the request that was issued.

So, our question is: Is there a way using the dataflow blocks to correlate request/responses? The ultimate goal is a request comes in to our API, gets issued to the dependent service, and is returned to the client. The code for our data flow implementation is below.

public class HttpClientWrapper : IHttpClientManager
{
    private readonly IConfiguration _configuration;
    private readonly ITokenService _tokenService;
    private HttpClient _client;

    private BufferBlock<string> _bufferBlock;
    private TransformBlock<string, JObject> _actionBlock;

    public HttpClientWrapper(IConfiguration configuration, ITokenService tokenService)
    {
        _configuration = configuration;
        _tokenService = tokenService;

        _bufferBlock = new BufferBlock<string>();

        var executionDataFlowBlockOptions = new ExecutionDataflowBlockOptions
        {
            MaxDegreeOfParallelism = 10
        };

        var dataFlowLinkOptions = new DataflowLinkOptions
        {
            PropagateCompletion = true
        };

        _actionBlock = new TransformBlock<string, JObject>(t => ProcessRequest(t),
            executionDataFlowBlockOptions);

        _bufferBlock.LinkTo(_actionBlock, dataFlowLinkOptions);
    }

    public void Connect()
    {
        _client = new HttpClient();

        _client.DefaultRequestHeaders.Add("x-ms-client-application-name",
            "ourappname");
    }

    public async Task<JObject> GetContent(string request)
    {
        await _bufferBlock.SendAsync(request);

        var result = await _actionBlock.ReceiveAsync();

        return result;
    }

    private async Task<JObject> ProcessRequest(string request)
    {
        if (_client == null)
        {
            Connect();
        }

        try
        {
            var accessToken = await _tokenService.GetTokenAsync(_configuration);

            var httpRequestMessage = new HttpRequestMessage(HttpMethod.Post,
                new Uri($"https://{_configuration.Uri}"));

            // add the headers
            httpRequestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
            // add the request body
            httpRequestMessage.Content = new StringContent(request, Encoding.UTF8,
                "application/json");

            var postRequest = await _client.SendAsync(httpRequestMessage);

            var response = await postRequest.Content.ReadAsStringAsync();

            return JsonConvert.DeserializeObject<JObject>(response);
        }
        catch (Exception ex)
        {
            // log error

            return new JObject();
        }
    }
}
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Adam Jones
  • 63
  • 6

3 Answers3

2

What you have to do is tag each incoming item with an id so that you can correlate the data input to the result output. Here's an example of how to do that:

namespace ConcurrentFlows.DataflowJobs {
    using System;
    using System.Collections.Concurrent;
    using System.Collections.Generic;
    using System.Threading.Tasks;
    using System.Threading.Tasks.Dataflow;

    /// <summary>
    /// A generic interface defining that:
    /// for a specified input type => an awaitable result is produced.
    /// </summary>
    /// <typeparam name="TInput">The type of data to process.</typeparam>
    /// <typeparam name="TOutput">The type of data the consumer expects back.</typeparam>
    public interface IJobManager<TInput, TOutput> {
        Task<TOutput> SubmitRequest(TInput data);
    }

    /// <summary>
    /// A TPL-Dataflow based job manager.
    /// </summary>
    /// <typeparam name="TInput">The type of data to process.</typeparam>
    /// <typeparam name="TOutput">The type of data the consumer expects back.</typeparam>
    public class DataflowJobManager<TInput, TOutput> : IJobManager<TInput, TOutput> {

        /// <summary>
        /// It is anticipated that jobHandler is an injected
        /// singleton instance of a Dataflow based 'calculator', though this implementation
        /// does not depend on it being a singleton.
        /// </summary>
        /// <param name="jobHandler">A singleton Dataflow block through which all jobs are processed.</param>
        public DataflowJobManager(IPropagatorBlock<KeyValuePair<Guid, TInput>, KeyValuePair<Guid, TOutput>> jobHandler) {
            if (jobHandler == null) { throw new ArgumentException("Argument cannot be null.", "jobHandler"); }

            this.JobHandler = JobHandler;
            if (!alreadyLinked) {
                JobHandler.LinkTo(ResultHandler, new DataflowLinkOptions() { PropagateCompletion = true });
                alreadyLinked = true;
            }
        }

        private static bool alreadyLinked = false;            

        /// <summary>
        /// Submits the request to the JobHandler and asynchronously awaits the result.
        /// </summary>
        /// <param name="data">The input data to be processd.</param>
        /// <returns></returns>
        public async Task<TOutput> SubmitRequest(TInput data) {
            var taggedData = TagInputData(data);
            var job = CreateJob(taggedData);
            Jobs.TryAdd(job.Key, job.Value);
            await JobHandler.SendAsync(taggedData);
            return await job.Value.Task;
        }

        private static ConcurrentDictionary<Guid, TaskCompletionSource<TOutput>> Jobs {
            get;
        } = new ConcurrentDictionary<Guid, TaskCompletionSource<TOutput>>();

        private static ExecutionDataflowBlockOptions Options {
            get;
        } = GetResultHandlerOptions();

        private static ITargetBlock<KeyValuePair<Guid, TOutput>> ResultHandler {
            get;
        } = CreateReplyHandler(Options);

        private IPropagatorBlock<KeyValuePair<Guid, TInput>, KeyValuePair<Guid, TOutput>> JobHandler {
            get;
        }

        private KeyValuePair<Guid, TInput> TagInputData(TInput data) {
            var id = Guid.NewGuid();
            return new KeyValuePair<Guid, TInput>(id, data);
        }

        private KeyValuePair<Guid, TaskCompletionSource<TOutput>> CreateJob(KeyValuePair<Guid, TInput> taggedData) {
            var id = taggedData.Key;
            var jobCompletionSource = new TaskCompletionSource<TOutput>();
            return new KeyValuePair<Guid, TaskCompletionSource<TOutput>>(id, jobCompletionSource);
        }

        private static ExecutionDataflowBlockOptions GetResultHandlerOptions() {
            return new ExecutionDataflowBlockOptions() {
                MaxDegreeOfParallelism = Environment.ProcessorCount,
                BoundedCapacity = 1000
            };
        }

        private static ITargetBlock<KeyValuePair<Guid, TOutput>> CreateReplyHandler(ExecutionDataflowBlockOptions options) {
            return new ActionBlock<KeyValuePair<Guid, TOutput>>((result) => {
                RecieveOutput(result);
            }, options);
        }

        private static void RecieveOutput(KeyValuePair<Guid, TOutput> result) {
            var jobId = result.Key;
            TaskCompletionSource<TOutput> jobCompletionSource;
            if (!Jobs.TryRemove(jobId, out jobCompletionSource)) {
                throw new InvalidOperationException($"The jobId: {jobId} was not found.");
            }
            var resultValue = result.Value;
            jobCompletionSource.SetResult(resultValue);            
        }
    }
}

Also see this answer for reference.

JSteward
  • 6,833
  • 2
  • 21
  • 30
1

Doing a simple throttling is not a particularly enticing use case for the TPL Dataflow library, and using instead a SemaphoreSlim seems simpler and more attractive. But if you wanted more features, like for example enforcing a minimum duration for each request, or having a way for waiting all the pending requests to complete, then the TPL Dataflow could have something to offer that the SemaphoreSlim couldn't. The basic idea is to avoid passing naked input values to the block, and trying later to associate them with the produced results. It is much safer to create tasks immediately upon request, send the tasks to an ActionBlock<Task>, and let the block activate and await asynchronously these tasks using its specified MaxDegreeOfParallelism. This way the input value and its result will be unambiguously tied together forever.

public class ThrottledExecution<T>
{
    private readonly ActionBlock<Task<Task<T>>> _actionBlock;
    private readonly CancellationToken _cancellationToken;

    public ThrottledExecution(int concurrencyLevel, int minDurationMilliseconds = 0,
        CancellationToken cancellationToken = default)
    {
        if (minDurationMilliseconds < 0) throw new ArgumentOutOfRangeException();
        _actionBlock = new ActionBlock<Task<Task<T>>>(async task =>
        {
            try
            {
                var delay = Task.Delay(minDurationMilliseconds, cancellationToken);
                task.RunSynchronously();
                await task.Unwrap().ConfigureAwait(false);
                await delay.ConfigureAwait(false);
            }
            catch { } // Ignore exceptions (errors are propagated through the task)
        }, new ExecutionDataflowBlockOptions()
        {
            MaxDegreeOfParallelism = concurrencyLevel,
            CancellationToken = cancellationToken,
        });
        _cancellationToken = cancellationToken;
    }

    public Task<T> Run(Func<Task<T>> function)
    {
        // Create a cold task (the function will be invoked later)
        var task = new Task<Task<T>>(function, _cancellationToken);
        var accepted = _actionBlock.Post(task);
        if (!accepted)
        {
            _cancellationToken.ThrowIfCancellationRequested();
            throw new InvalidOperationException(
                "The component has been marked as complete.");
        }
        return task.Unwrap();
    }

    public void Complete() => _actionBlock.Complete();
    public Task Completion => _actionBlock.Completion;
}

Usage example:

private ThrottledExecution<JObject> throttledExecution
    = new ThrottledExecution<JObject>(concurrencyLevel: 10);

public Task<JObject> GetContent(string request)
{
    return throttledExecution.Run(() => ProcessRequest(request));
}
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
0

I appreciate the answer provided by JSteward. His is a perfectly acceptable approach; however, I ended up doing this by using a SemaphoreSlim. SemaphoreSlim provides two things that allow this to be a powerful solution. First, it provides a constructor overload where you can send in a count. This count refers to the number of concurrent items that will be able to get past the semaphore waiting mechanism. The waiting mechanism is provided by a method called WaitAsync. With the below approach where the Worker class is as a Singleton, concurrent requests come in, are limited to 10 at a time executing the http request, and the responses are all returned to the correct requests. So an implementation might look like the following:

public class Worker: IWorker
{
    private readonly IHttpClientManager _httpClient;
    private readonly ITokenService _tokenService;

    private readonly SemaphoreSlim _semaphore;

    public Worker(IHttpClientManager httpClient, ITokenService tokenService)
    {
        _httpClient = httpClient;
        _tokenService = tokenService;

        // we want to limit the number of items here
        _semaphore = new SemaphoreSlim(10);
    }

    public async Task<JObject> ProcessRequestAsync(string request, string route)
    {
        try
        {
            var accessToken = await _tokenService.GetTokenAsync(
                _timeSeriesConfiguration.TenantId,
                _timeSeriesConfiguration.ClientId,
                _timeSeriesConfiguration.ClientSecret);

            var cancellationToken = new CancellationTokenSource();

            cancellationToken.CancelAfter(30000);

            await _semaphore.WaitAsync(cancellationToken.Token);
            var httpResponseMessage = await _httpClient.SendAsync(new HttpClientRequest
            {
                Method = HttpMethod.Post,
                Uri = $"https://someuri/someroute",
                Token = accessToken,
                Content = request
            });

            var response = await httpResponseMessage.Content.ReadAsStringAsync();

            return response;
        }
        catch (Exception ex)
        {
            // do some logging

            throw;
        }
        finally
        {
            _semaphore.Release();
        }
    }
}
Adam Jones
  • 63
  • 6
  • 2
    This implementation is problematic. An exception during the `_tokenService.GetTokenAsync` invocation will release the semaphore without acquiring it first, resulting to an incremented level of concurrency, which will open the window for rejections by the dependent service. The correct place for the `_semaphore.WaitAsync` command is just before entering the try/finally block. – Theodor Zoulias Jun 13 '20 at 23:01