There is an XML document that is passed through a named pipe. The XML document is large, about 500 megabytes. The structure of the document is roughly like this:
<Root>
<SomeElement/>
<?pi?>
<NewMessage>
<A>
<B></B>
</A>
</NewMessage>
<?pi?>
<NewMessage>
<A>
<B></B>
</A>
</NewMessage>
<?pi?>
<NewMessage>
<A>
<B></B>
</A>
</NewMessage>
</Root>
Each new message starts with a processing instruction. I want to be able to detect processing instructions before opening an XML reader, and open an XML reader for each message that starts with a processing instruction. I am doing this in order to check the tag balance within the message, and if the balance is not maintained, skip it. So, if there is a document like this:
<Root>
<SomeElement/>
<?pi?>
<NewMessage>
<A>
<B>
<?pi?>
<NewMessage>
<A>
<B></B>
</A>
</NewMessage>
<?pi?>
<NewMessage>
<A>
<B></B>
</A>
</NewMessage>
</Root>
Then the first message should be discarded, and all the remaining ones should be saved with the rest of XML document. So result will be:
<Root>
<SomeElement/>
<?pi?>
<NewMessage>
<A>
<B></B>
</A>
</NewMessage>
<?pi?>
<NewMessage>
<A>
<B></B>
</A>
</NewMessage>
</Root>
I want to use XML Reader with a name table, but it is not clear how to preprocess the stream in order to identify processing instructions in advance. Thank you in advance.
I tried something like that, but obviously there will be problems with large xml and not every message processing by xml reader
XmlNameTable nameTable = new NameTable();
byte[] buffer = new byte[4096];
int bytesRead = pipe.Read(buffer, 0, buffer.Length);
string input = Encoding.UTF8.GetString(buffer, 0, bytesRead);
Match piMatch = Regex.Match(input, "<\\?pi.*?\\?>");
if (piMatch.Success)
{
string pi = piMatch.Value;
string xml = input.Substring(piMatch.Index + piMatch.Length);
using (XmlReader reader = XmlReader.Create(new StringReader(xml), new XmlReaderSettings { NameTable = nameTable }))
{
while (reader.Read())
{
// ...
}
}
}