1

I have an API that I pull large datasets from (millions of records).

I am using XMLReader thus:

XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;

string xmldocstring = "";

using (XmlReader reader = XmlReader.Create(uri, settings))
{
    reader.ReadStartElement("DATASET");

    while (!reader.EOF)
    {
        if (reader.NodeType == XmlNodeType.Element)
        {
            try
            {
                XElement elR = XNode.ReadFrom(reader) as XElement;

                //PROCESS XML AND DO WHATEVER WITH IT

            }
            catch (Exception ex)
            {
                Log.WriteLine(ex.Message);
                Log.WriteLine(ex.StackTrace);
            }
        }
        else
        {
            reader.Read();
        }
    }
}

It works most of the time. But if I get a record set that is over 1.5 million records, I CONSISTENTLY get the following error, which is recorded in my log (recorded by the catch in the code above).

Error Message:Unable to read data from the transport connection: The connection was closed.

Stack trace: at System.Net.ConnectStream.Read(Byte[] buffer, Int32 offset, Int32 size)

at System.Xml.XmlRegisteredNonCachedStream.Read(Byte[] buffer, Int32 offset, Int32 count)

at System.Xml.XmlTextReaderImpl.ReadData()

at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)

at System.Xml.XmlTextReaderImpl.ParseText()

at System.Xml.XmlTextReaderImpl.ParseElementContent()

at System.Xml.XmlTextReaderImpl.Read()

at System.Xml.XmlReader.ReadStartElement(String name)

at My_Files.DataBuilder.GetDataFromAPI(Dictionary`2 pData, String uri, String type, String siteid, String mlid, String mid, DateTime start, DateTime end)

at My_Files.DataBuilder.GetTransactionData(Dictionary`2 pData, String type, String siteID, String mlid, String mid, String apiurl, String apipass, DateTime start, DateTime end)

at My_Files.Program.GetAndProcessData(String transactionFile, Dictionary2 pData, String siteid, String mlid, List1 dmids, DateTime dStartDate, DateTime dEndDate)

at My_Files.Program.Run()

What is going on? What can I do to pull large data sets of xml data?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
richard
  • 12,263
  • 23
  • 95
  • 151
  • 2
    Maybe [it is a timeout](http://stackoverflow.com/questions/3547931/prevent-or-handle-time-out-with-xmlreader-createuri)? – chue x Dec 21 '13 at 20:34
  • Yeah it could be. 1. How would I know that. 2. If it is, how do I fix that? – richard Dec 21 '13 at 20:46
  • I suppose you could try connecting using the accepted answer in the linked question. If it works, it's likely a timeout. I'm not sure how to fix it, other than the way specified in the linked question. – chue x Dec 21 '13 at 20:48
  • It looks like that answer is pulling back all data at once though. Is that right? GetResponse gets the entire response? It would cause an outofmemoryexception for me because the data is often larger than the 2GB limit for .net apps. – richard Dec 21 '13 at 20:54
  • Good point. I missed the fact that your code does not pull back everything. – chue x Dec 21 '13 at 20:57
  • `GetResponse` only gets the entire header, `GetResponseStream` gets the body as a stream. `XmlReader` should immediately start processing. – C.Evenhuis Dec 21 '13 at 22:22
  • I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders Dec 22 '13 at 00:15
  • Also, just display `ex.Tostring()` instead of `Message` and `StackTrace`. – John Saunders Dec 22 '13 at 00:16

1 Answers1

0

It appears that it is taking to long for the called process to generate the XML file.

You did not indicate what the value of uri is. Many web servers have a short timeout before the called process should start returning data. If you are building up the entire document on the server, it could be taking too long.

Just because you are streaming the result with the reader, doesn't mean the that the server is streaming the data.

I would focus attention on the called process.

Brad Bruce
  • 7,638
  • 3
  • 39
  • 60
  • Thanks Brad. Unfortunately I have no control over the called process. It is a 3rd party vendor. – richard Dec 22 '13 at 05:03