1

The problem summary: I need to make call to HTTP resource A while using name resolution from previous HTTP request to resource B on the same host.

CASE 1. Consecutive calls to same resource produce faster result after 1st call. Profiler tells me that the difference between 1st and 2nd call goes to DNS name resolution (GetHostAddresses) enter image description here

var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/b.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}

var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/b.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}

CASE 2. Consecutive calls to different resources on the same host produce same delay. Profiler tells me that they both incur calls to DNS name resolution.

var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/a.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}

var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/b.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}

I wonder why in case 2 second call cant use DNS cache from first call? its the same host.

And main question - how to change that?

EDIT the behaviour above covers also use of HttpClient class. It appeared this is specific to the few webservers I use and this issue does not happen on other servers. I cant figure what specifically happens but I suspect the webservers in question (Amazon CloudFront and Akamai) force close connection after it has been served, ignoring my requests keep-alive headers. I am going to close this for now as it is not possible to formulate a conscious question..

Boppity Bop
  • 9,613
  • 13
  • 72
  • 151
  • DNS should be cached. Connections are even kept open and re-used between different HttpWebRequests if possible. Do you have a web proxy set up for your user account? If you have a corporate proxy, it might actually execute some javascript for each adress, to check if it should go through the proxy or not. And I'm not sure if that calculation is cached. – gnud Jan 25 '21 at 19:14
  • No proxy. I think the problem is with exactly http requests pool. The connection to resource A being reused in 2nd request to resource A. Hence no name resolution 2nd time. – Boppity Bop Jan 25 '21 at 19:18
  • I was wrong about the connections "automatically" being re-used. But I was right about the DNS caching. When I create two httpwebrequests to different files on the same host, one after the other, only one DNS lookup is done. You can verify this yourself, by enabling verbose tracing for System.Net.Sockets and looking for 'Entering DNS::TryInternalResolve'. – gnud Jan 25 '21 at 20:14
  • nope.. cant be true. did you do it in .NET 5? if you look at the call stack on the picture - there is `HttpConnectionPool` class - this one keeps socket open for 15sec (found by trial). So if you call host/a and host/a - the 2nd call is a lot faster because it doesnt need name resolution (because the socket is still open from 1st call).. And in contrary if you call host/a and host/b - both call DNS name resolution - so the DNS calls are not cached in any case! if you are sure I am wrong - please post answer with your code snipped Ill test it.. – Boppity Bop Jan 25 '21 at 20:22
  • Sorry, I did this in 4.8. Let me figure out how on earth I trace stuff in 5 :) – gnud Jan 25 '21 at 20:23
  • dont trace. just time both calls with `Stopwatch` - difference is huge. call host/a then host/a and then compare to host/a and host/b calls – Boppity Bop Jan 25 '21 at 20:24

1 Answers1

2

Your problem doesn't exist for System.Net.Http.HttpClient, try it instead. It can reuse the existing connections (no DNS cache needed for such calls). Looks like that is exactly what you want to achieve. As a bonus it supports HTTP/2 (can be enabled with Property assignment at HttpClient instance creation).

WebRequest is ancient and not recommentded by Microsoft for new development. In .NET 5 HttpClient is rather faster (twice?).

Create the HttpClient instance once per application (link).

private static readonly HttpClient client = new HttpClient();

Analog of your request. Note await is available only in methods marked as async.

string text = await client.GetStringAsync("https://www.somehost.com/resources/b.txt");

You may also do multiple requests at once without spawning concurrent Threads.

string[] urls = new string[]
{ 
    "https://www.somehost.com/resources/a.txt",
    "https://www.somehost.com/resources/b.txt"
};
List<Task<string>> tasks = new List<Task<string>>();
foreach (string url in urls)
{
    tasks.Add(client.GetStringAsync(url));
}
string[] results = await Task.WhenAll(tasks);

If you're not familiar with Asynchronous programming e.g. async/await, start with this article.

Also you can set a limit how many requests will be processed at once. Let's do the same request 1000 times with limit to 10 requests at once.

static async Task Main(string[] args)
{
    Stopwatch sw = new StopWatch();
    string url = "https://www.somehost.com/resources/a.txt";
    using SemaphoreSlim semaphore = new SemaphoreSlim(10);
    List<Task<string>> tasks = new List<Task<string>>();
    sw.Start();
    for (int i = 0; i < 1000; i++)
    {
        await semaphore.WaitAsync();
        tasks.Add(GetPageAsync(url, semaphore));
    }
    string[] results = await Task.WhenAll(tasks);
    sw.Stop();
    Console.WriteLine($"Elapsed: {sw.Elapsemilliseconds}ms");
}

private static async Task GetPageAsync(string url, SemaphoreSlim semaphore)
{
    try
    {
        return await client.GetStringAsync(url);
    }
    finally
    {
        semaphore.Release();
    }
}

You may measure the time.

aepot
  • 4,558
  • 2
  • 12
  • 24
  • 1
    thank you. a very generous answer. i did try it 3 ways - httpwebrequest, webclient and httpclient (one instance per app yes). all showed same issue. let me try to write a usable code so you can see for yourself. i think http requests pool cache the same way as sql connections pool - ie caching exact same strings. so a.txt and b.txt would be cached as 2 different connections.. – Boppity Bop Jan 26 '21 at 15:32
  • 1
    I am going to accept your answer even if it doesnt help in my particular case (see edit). it seems you are right - some webservers behave as you describe (and as I expected). but the servers i need - they dont. HttpClient or not.. – Boppity Bop Jan 26 '21 at 15:57
  • @BoppityBop Thanks. jfyi, `HttpWebRequest` and `WebClient` using the same underlying old HTTP engine. `HttpClient` is powered by modern `SocketsHttpHandler`. – aepot Jan 26 '21 at 16:34
  • idk about that. if you look at the profiler tree on the pic - httpwebrequest uses httpclient inside.. (dont trust everything you read on the web ;) ) – Boppity Bop Jan 26 '21 at 16:57
  • `httpwebrequest uses httpclient` - OK, let it be, maybe my knowledge deprecated and it was as i think in .NET Core 3.1 but changed in .NET 5. Thanks. One thing i remember is twice performance boost after i was migrated from `HttpWebRequest` to `HttpClient`. – aepot Jan 26 '21 at 17:22
  • 1
    @BoppityBop got it. `HttpWebRequest` creates new `HttpClient` per instance. That means new `HttpClientHandler` per request, that means not reusage of the connections. – aepot Jan 26 '21 at 17:36
  • 1
    @BoppityBop btw, try HTTP/2, it can fix the closed connections issue, in case of server support. `= new HttpClient() { DefaultRequestVersion = new Version(2, 0) }`. – aepot Jan 26 '21 at 20:24