0

I'm trying to read through a 2.5GB XML file and delete certain nodes, lets say, the "CD" elements and the "DVD" elements. Currently I'm doing something like this:

using (XmlReader reader = XmlReader.Create("file.xml"))
{
    DeleteElements(reader.ReadElements("CD"));
    DeleteElements(reader.ReadElements("DVD")); // reader returns 0 elements
}

Note: DeleteElements just loops these elements and removes them from the document, but that's mostly unimportant for the purposes of this question.

Currently I find that no "DVD" element are retrieved. If you've worked with XmlReader that much before, I'm sure you can figure the cause of the problem here: after the reader reads the document for "CD" nodes, the reader doesn't find any "DVD" elements because the reader is at the end of the document.

Considering the large size of the XML file, and the number of elements I'm trying to retrieve, I can't load the entire document into memory because you'd get a OutOfMemoryException - this means no XDocument or XPathDocument goodness.

Is there any way to get XmlReader to return both "CD" and "DVD" as it reads through the document? Loading the document initially is quite time consuming, so I don't want to do this multiple times. Something awesome like reader.ReadElements("DVD|CD") would be sweet.

ajbeaven
  • 9,265
  • 13
  • 76
  • 121

1 Answers1

2

XmlReader is a forward only xml parser. If there would be a ReadElements method, it would run the reader to the end, and then there are no more DVD elements. So you would have to run twice over your file.

Basic usage of XmlReader:

using (XmlReader reader = XmlReader.Create("input.xml")) {
  while (reader.Read()) {
    switch (reader.NodeType) {
    case XmlNodeType.Element:
      switch (reader.Name) {
      case "CD":
        // do something with a CD
        break;
      case "DVD":
        // do something with a DVD
        break;
      default:
        // do something with all other elements
        break;
      }
      break;
    }
  }
}

What are you doing in the DeleteElements method? You probably need to create a XmlWriter for a new temp file, then write all elements except the one you'd like to delete to the temp file, and at least replace the original file with the temp file.

This way you have one loop over all elements, including the deletion (exclusion) of some.

metadings
  • 3,798
  • 2
  • 28
  • 37
  • Great answer - I ended up doing exactly what you suggested here so should have come back an answered it myself! – ajbeaven Dec 21 '12 at 10:59