I have a batch of urls that I want to fetch. The list contains urls (more then 50.000) with different domainnames but all domains use the same load balanced server ip.
For each url I want to log its result code, its fetch duration and the hash of the content and its redirect headers.
The current method gets around 10 fetches per second with response times of around half a second.
How can I have the following execute faster?
I currently have the following code construction:
Parallel.ForEach(domainnames, ProcessItem);
The ProcessItem
is based on the following:
static void Fetch2(Uri url)
{
HttpWebResponse response;
try
{
var request = (HttpWebRequest)WebRequest.Create(url);
request.AllowAutoRedirect = false;
response = (HttpWebResponse)request.GetResponse())
}
catch (WebException ex)
{
response = ex.Response as HttpWebResponse;
}
if (response == null) return;
using (response)
{
// Process response.....
}
}
I have the following configuration applied:
<system.net>
<connectionManagement>
<add address="*" maxconnection="100" />
</connectionManagement>
</system.net>
I tried the following:
- Limit the Parallel.ForEach by specifying new ParallelOptions { MaxDegreeOfParallelism = 25 } as I thought that I maybe was doing to much web requests but even lowering it more does not result in improved performance.
- Applying
async
withTask.WaitAll(Task[])
but this resulting in lots of errors as all tasks get created very fast but almost all result in connection errors.
Interesting observations are:
- My internet network connection is not really under load so not congested
- cpu, memory and IO are not really intesting either but IO shows dips.