How to parse XML from HttpWebRequest asynchronously?

Question

My main program running 8 tasks using Task.Factory.StartNew

Each task will request XML format result from webservice, then parsed to collection which can be written to MSSQL using TVP.

The program works but the efficiency gain using TPL isn't what I expected. After using stopwatch in various point it seems to me that the tasks are interfering with each other, maybe one blocking another. All figures point to the downloading section that uses HttpWebRequest.

After searching and reading a bit on asynchronous programming in c# I tried to modified my code to run the download section asynchronously, but the result still show similar level of blocking without using asynchronous coding.

There are 3 type of coding I've found and a few reference links to them:

How to use HttpWebRequest (.NET) asynchronously? -When I use this method I pass the the XDocument around using the custom object in the download section method

Asynchronous Programming in C# using Iterators http://tomasp.net/blog/csharp-async.aspx -string/stream is returned and parse using XDocument.Load/Parse in the main method

Below code block shows the last method found and implemented in my code

Main class that start the tasks

private static void test() {
    DBReader dbReader = new DBReader();
    Dictionary<string, DateTime> jobs = dbReader.getJob();
    JobHandler jh = new JobHandler();
    Stopwatch swCharge = new Stopwatch();
    Stopwatch swDetail = new Stopwatch();
    Stopwatch swHeader = new Stopwatch();
    //more stopwatch

    Task[] tasks = new Task[] {
    Task.Factory.StartNew(() => jh.processData<RawChargeCollection, RawCharge>(jobs["RawCharge"], 15, swCharge)),
    Task.Factory.StartNew(() => jh.processData<RawDetailCollection, RawDetail>(jobs["RawDetail"], 15, swDetail)),
    Task.Factory.StartNew(() => jh.processData<RawHeaderCollection, RawHeader>(jobs["RawHeader"], 15, swHeader))
    };
    Task.WaitAll(tasks);  
}

The processData method

public void processData<T, S>(DateTime x, int mins, Stopwatch sw)
            where T : List<S>, new()
            where S : new() {
            DateTime start = x;
            DateTime end = x.AddMinutes(mins);
            string fromDate, toDate;
            StringBuilder str = new StringBuilder();
            XMLParser xmlParser = new XMLParser();
            DBWriter dbWriter = new DBWriter();
            while (end <= DateTime.UtcNow) {
                fromDate = String.Format("{0:yyyy'-'MM'-'dd HH':'mm':'ss}", start);
                toDate = String.Format("{0:yyyy'-'MM'-'dd HH':'mm':'ss}", end);  
                try {
                    sw.Restart();
                    WebserviceClient ws = new WebserviceClient();

                    XDocument xDoc = null;
                    var task = ws.GetRawData<S>(fromDate, toDate);    
                    xDoc = XDocument.Parse(task.Result);    
                    //show the download time    

                    sw.Restart();
                    T rawData = xmlParser.ParseXML<T, S>(xDoc);

                    if (rawData.Count != 0) {
                        sw.Restart();
                        dbWriter.writeRawData<T, S>(rawData, start, end);
                        //log success
                    }
                    else {
                        //log no data
                    }
                }
                catch (Exception e) {
                    //log fail
                }
                finally { 
                    start = start.AddMinutes(mins);
                    end = end.AddMinutes(mins);
                }
            } 
        }

GetRawData is just responsible to construct the needed URL used in GetData.

Download data section:

private static Task<string> GetData(string param) {
            string url = String.Format("my working URL/{0}", param);
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.MediaType = "application/xml";
            Task<WebResponse> task = Task.Factory.FromAsync(
                request.BeginGetResponse,
                asyncResult => request.EndGetResponse(asyncResult),
                (object)null);

            return task.ContinueWith(t => ReadStreamFromResponse(t.Result));
        }

    private static string ReadStreamFromResponse(WebResponse response) {
        using (Stream responseStream = response.GetResponseStream())
        using (StreamReader sr = new StreamReader(responseStream)) {
            //Need to return this response 
            string strContent = sr.ReadToEnd();
            return strContent;
        }
    }

In the processData method I timed the code needed to download from webservice. Download takes from 400ms to 100000ms. Normal time around 3000ms to 8000ms. If I just run 1 task the client process time is only slightly longer than the server process time.

However after running more tasks, the download that takes 450ms to 3000ms (or whatever) on server can now take up to 8000ms -90000ms for the client to complete the download section.

In my scenario the bottleneck should be at server side, from my log it shows the client is.

Most article found for Asynchronous Programming C# seems to demo reading and handling stream/string with no example for XML. Is my code failing because of XML?? If not what is the problem of my code?

EDIT: Yes my dev machine and users/target machine is XP, way too much to use .net 4.5 or the CTP.

ServicePointManager.DefaultConnectionLimit and app.config connectionManagement seems to be same thing, so I pick app.config since that can be changed.

At first changing the max connection help greatly but didn't really solve the problem. After timing code block with Thread.Sleep(random) it seems the 'blocking' isn't relate to the concurrent code.

The processData first download from webservice (need max connection here), then do some minor mapping, finally write to DB, writing to DB never takes over 1 secs, comparing to the download it was nothing, but after adding max connection to DB (same number as webservice) there wasn't any wait all of sudden.

So the max connection to DB also matter. But I do not understand why writing to DB with 150-600ms can cause waiting of over 20 secs.

What even confuse me is the waiting time was in the download block, not in the writing DB block.

given the 4.0 tag, I'm assuming this isn't an option for you, but if/when you can start using 4.5, the new HttpClient class is 'natively' async and makes async handling of responses much easier IMHO - http://msdn.microsoft.com/en-us/library/system.net.http.httpclient(VS.110).aspx — James Manning, Jun 15 '12 at 09:55

score 1 · Accepted Answer · edited May 23 '17 at 11:55

I would go back to the simpler form, at least for debugging, where they were each 'normal'/synchronous code. Since you'll be worst-case blocking 8 threads unnecessarily, I wouldn't consider that a big deal just yet.

I'd imagine what you're hitting is instead the default behavior of limiting the number of concurrent requests.

From this related SO thread...

Max number of concurrent HttpWebRequests

...you may want to look at what Jon Skeet pointed to, the connectionManagement element:

http://msdn.microsoft.com/en-us/library/fb6y0fyc.aspx

The connectionManagement element defines the maximum number of connections to a server or group of servers.

Also, Jon's recommendation to replace the http calls with just Thread.Sleep to see if the concurrency is affected is excellent. If your 8 tasks can all do parallel Thread.Sleep calls, then your issue isn't 'top-level' concurrency, but instead a restriction enforced by what they're doing (like the default concurrent connection limit).

Or set the limit using `ServicePointManager.DefaultConnectionLimit`. — svick, Jun 15 '12 at 11:52

How to parse XML from HttpWebRequest asynchronously?

1 Answers1