11

I want to iterate a batch of requests, sending each one of them to an external API using HttpClient class.

  foreach (var MyRequest in RequestsBatch)
  {
            try
            {
                HttpClient httpClient = new HttpClient();
                httpClient.Timeout = TimeSpan.FromMilliseconds(5);
                HttpResponseMessage response = await httpClient.PostAsJsonAsync<string>(string.Format("{0}api/GetResponse", endpoint), myRequest);
                JObject resultResponse = await response.Content.ReadAsAsync<JObject>();
            }
            catch (Exception ex)
            {
                continue;
            }
 }

The context here is I need to set a very small timeout value, so in case the response takes more than that time, we simply get the "Task was cancelled" exception and continue iterating.

Now, in the code above, comment these two lines:

                HttpResponseMessage response = await httpClient.PostAsJsonAsync<string>(string.Format("{0}api/GetResponse", endpoint), myRequest);
                resultResponse = await response.Content.ReadAsAsync<JObject>();

The iteration ends very fast. Uncomment them and try again. It takes a lot of time.

I wonder if calling PostAsJsonAsync/ReadAsAsync methods with await takes more time than the timeout value?

Based on the answer below, supposing it will create different threads, we have this method:

  public Task<JObject> GetResponse(string endPoint, JObject request, TimeSpan timeout)
    {
        return Task.Run(async () =>
        {
            try
            {
                HttpClient httpClient = new HttpClient();
                httpClient.Timeout = TimeSpan.FromMilliseconds(5);
                HttpResponseMessage response = await httpClient.PostAsJsonAsync<string>(string.Format("{0}api/GetResponse", endPoint), request).WithTimeout<HttpResponseMessage>(timeout);
                JObject resultResponse = await response.Content.ReadAsAsync<JObject>().WithTimeout<JObject>(timeout);
                return resultResponse;
            }
            catch (Exception ex)
            {
                return new JObject() { new JProperty("ControlledException", "Invalid response. ")};
            }
        });
    }

An exception is raised there and the JObject exception should be returned, very fast, however, if using httpClient methods, even if it raises the exception it takes a lot of time. Is there a behind the scenes processing affecting the Task even if the return value was a simple exception JObject?

If yes, which another approach could be used to send a batch of requests to an API in a very fast way?

svick
  • 236,525
  • 50
  • 385
  • 514
Alberto Montellano
  • 5,886
  • 7
  • 37
  • 53

2 Answers2

38

I agree with the accepted answer in that the key to speeding things up is to run the requests in parallel. But any solution that forces additional threads into the mix by use of Task.Run or Parallel.ForEach is not gaining you any efficiency with I/O bound asynchronous operations. If anything it's hurting.

You can easily get all calls running concurrently while letting the underlying async subsystems decide how many threads are required to complete the tasks as efficiently as possible. Chances are that number is much smaller than the number of concurrent calls, because they don't require any thread at all while they're awaiting a response.

Further, the accepted answer creates a new instance of HttpClient for each call. Don't do that either - bad things can happen.

Here's a modified version of the accepted answer:

var httpClient = new HttpClient {
    Timeout = TimeSpan.FromMilliseconds(5)
};

var taskList = new List<Task<JObject>>();

foreach (var myRequest in RequestsBatch)
{
    // by virtue of not awaiting each call, you've already acheived parallelism
    taskList.Add(GetResponseAsync(endPoint, myRequest));
}

try
{
    // asynchronously wait until all tasks are complete
    await Task.WhenAll(taskList.ToArray());
}
catch (Exception ex)
{
}

async Task<JObject> GetResponseAsync(string endPoint, string myRequest)
{
    // no Task.Run here!
    var response = await httpClient.PostAsJsonAsync<string>(
        string.Format("{0}api/GetResponse", endpoint), 
        myRequest);
    return await response.Content.ReadAsAsync<JObject>();
}
Todd Menier
  • 37,557
  • 17
  • 150
  • 173
  • 3
    not sure what was the suggestion at the time of answer, but as of today it is recommended **not** to create many new `HttpClient`s, and instead try to reuse the same one as much as possible. – superjos Mar 28 '18 at 09:56
  • cool. Just for the records, I very recently read about (yet) another change in this story: with upcoming netcore 2.1 you can technically create how many `HttpClient`s you want, those are not really expensive. The factory takes care instead of controlling creation of internal `HttpClientHandler`, which is the *actual* expensive component. Here's a [post about it](https://www.stevejgordon.co.uk/introduction-to-httpclientfactory-aspnetcore) – superjos May 04 '18 at 23:58
  • I've done some testing on Tasks and async/await and found that they are really just queueing work to ThreadPool. They will always put the work on a background thread. That thread will suspend while waiting for the I/O operation to return. Suspended threads still count towards ThreadPool max threads. It is a good solution though because you don't have the overhead of creating new threads since you are running on ThreadPool. The disadvantage to this solution is that you are subject to how ThreadPool is configured since it is a static class. – Chris Rollins Jun 24 '18 at 08:06
2

It doesn't look like you're actually running a seperate thread for each request. Try something like this:

var taskList = new List<Task<JObject>>();

foreach (var myRequest in RequestsBatch)
{
    taskList.Add(GetResponse(endPoint, myRequest));
}

try
{
    Task.WaitAll(taskList.ToArray());
}
catch (Exception ex)
{
}

public Task<JObject> GetResponse(string endPoint, string myRequest)
{
    return Task.Run(() =>
        {
            HttpClient httpClient = new HttpClient();

            HttpResponseMessage response = httpClient.PostAsJsonAsync<string>(
                 string.Format("{0}api/GetResponse", endpoint), 
                 myRequest, 
                 new CancellationTokenSource(TimeSpan.FromMilliseconds(5)).Token);

            JObject resultResponse = response.Content.ReadAsAsync<JObject>();
        });
}
RagtimeWilly
  • 5,265
  • 3
  • 25
  • 41
  • If you place a code line after : Task.WaitAll(taskList.ToArray()); Will it get hit in less than 1 sec? In my case it gets hit after 20 seconds. I would like the timeout could be used. I asked a similar solution you propose here: http://stackoverflow.com/questions/29102274/c-sharp-async-await-calls-using-httpclient-with-timeout – Alberto Montellano Mar 17 '15 at 21:57
  • It depends on the size of the ThreadPool and how many requests timeout. If you're making a request that's regularly timing out after 5 seconds how could you possibly make 300 requests in under a second? – RagtimeWilly Mar 17 '15 at 22:00
  • Just because you try to start 300 threads at once they won't actually all run at the same time. They'll be throttled by thread pool - some won't start until others after finished. Log the number of active threads to see what I mean. – RagtimeWilly Mar 17 '15 at 22:02
  • This is a very good article on the subject: https://msdn.microsoft.com/en-us/magazine/ff960958.aspx – RagtimeWilly Mar 17 '15 at 22:04
  • Right, but in the sample, if the timeout expires, it returns an exception and we "do nothing" if exception is raised (In my case plan to return a default empty JObject), so threads should be free very fast. Let's suppose we don't use Httpclient, the threads finish very fast. Now if we use HttpClient and call to methods, no matter the timeout we define for it, it takes a lot of time. That's what I'm asking. – Alberto Montellano Mar 17 '15 at 22:05
  • Say for example you have 300 requests to make and you call the above code to create a Task for each one. Depending on hardware etc only 100 of these might run at once (the other 200 wait). Then when the first 100 complete the others start and the last 100 wait. Then finally the last 100 start. So the time out for the last 100 might not even start until 10 seconds after you created the Task. This is why you're seeing delays of more than 5s. Does this make sense? – RagtimeWilly Mar 17 '15 at 22:13
  • I got your point, but my timeout value is 5 milliseconds, even if they take 5 one by one, it could take 1000 milliseconds, 1 second in total. I see a delay of around 15-20 seconds, it seems it doesn't matter the timeout used for HttpClient. I imagine if it takes more than 5 milliseconds timeout expires and exception is raised, so we "do nothing", then thread is available, and a new 5 milliseconds operation is run. So, If I have 200 I hope they finish in around a second, not 20 seconds. – Alberto Montellano Mar 17 '15 at 22:20
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73210/discussion-between-ragtimewilly-and-alberto-montellano). – RagtimeWilly Mar 17 '15 at 22:59
  • I updated answer slightly. There is an overhead in creating threads, de-serialization and throwing / catching exceptions. So you shouldn't think that because you set timeout to 5ms every task will take exactly 5ms and the next one will start immediately. But the above solution gave me best results. I was able to make 300 requests in ~3 seconds. – RagtimeWilly Mar 17 '15 at 23:01
  • 7
    Operations on `HttpClient` are I/O bound and inherently asynchronous. Forcing each call to run on a different thread by using `Task.Run` is gaining you nothing in terms of efficiency or overall speed. If anything it's probably hurting. – Todd Menier Mar 23 '15 at 15:53
  • 1
    "It doesn't look like you're actually running a seperate thread for each request." Separate threads are not needed to make requests in batches. See the answer by @Todd Menier http://blog.stephencleary.com/2013/11/there-is-no-thread.html – Jerry Joseph May 31 '16 at 20:47