4

I want to read a xml string ignoring the header and the comments.

To ignore the comments it's simples and I found a solution here. But I'm not finding any solution to ignore the header.

Let me give an example:

Consider this xml:

<?xml version="1.0" encoding="iso-8859-1"?>
<!-- Some comments -->
<Tag Attribute="3">
    ...
</Tag>

I want to read the xml to a string obtaining just the element "Tag" and others elements but withou the "xml version" and the comments.

The element "Tag" is only an example. Could exist many others.

So, I want only this:

<Tag Attribute="3">
    ...
</Tag>

The code that I've come so far:

XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreComments = true;
XmlReader reader = XmlReader.Create("...", settings);
xmlDoc.Load(reader);

And I'm not finding anything on XmlReaderSettings to do that.

Do I need to go node by node choosing only the ones I want? This setting does not exist?

EDIT 1: Just to resume my problem. I need the contents of the xml to use in a CDATA of a WebService. When I'm sending comments or xml version, I'm getting an specific error of that part of xml. So I assume that when I read the xml without the version, header and comments I'll be good to go.

Community
  • 1
  • 1
Iúri dos Anjos
  • 371
  • 4
  • 20
  • 1
    Very strange thing to ask - feels like [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)... What your real problem? Possibly if encoding is wrong you can use reading XML from string... – Alexei Levenkov Oct 31 '14 at 18:34
  • `XmlReader` shouldn't be returning the header as a node. If you iterate through your nodes you shouldn't see the header. – MikeH Oct 31 '14 at 18:38
  • @Mihai: I needed to wait 2 days to choose my own awnser. :) – Iúri dos Anjos Nov 03 '14 at 12:48

3 Answers3

6

Here's a really simple solution.

using (var reader = XmlReader.Create(/*reader, stream, etc.*/)
{
    reader.MoveToContent();
    string content = reader.ReadOuterXml();
}
Chris
  • 3,400
  • 1
  • 27
  • 41
1

Well, it seems that there is no settings to ignore declaration, so I had to ignore it myself.

Here's the code I've written for those who might be interested:

private string _GetXmlWithoutHeadersAndComments(XmlDocument doc)
{
    string xml = null;

    // Loop through the child nodes and consider all but comments and declaration
    if (doc.HasChildNodes)
    {
        StringBuilder builder = new StringBuilder();

        foreach (XmlNode node in doc.ChildNodes)
            if (node.NodeType != XmlNodeType.XmlDeclaration && node.NodeType != XmlNodeType.Comment)
                builder.Append(node.OuterXml);

        xml = builder.ToString();
    }

    return xml;
}
Iúri dos Anjos
  • 371
  • 4
  • 20
0

If you want to only get the Tag elements, you should just read the XML as normal, then find them using the XmlDocument's XPath capabilities.

For your xmlDoc object:

var nodes = xmlDoc.DocumentElement.SelectNodes("Tag");

You can then iterate through these like so:

foreach (XmlNode node in nodes) { }

Or, obviously, you could just put your SelectNodes query into the foreach loop, if you're never going to reuse the nodes object.

This will return all Tag elements within your XML document, and you can do whatever you see fit with them.

There's no need to ever encounter comments while using XmlDocument if you don't want to, and you're not going to end up getting results including either the header or the comments. Is there a particular reason you're trying to remove pieces of the XML before you begin parsing it?

Edit: Based on your edit, it seems like you're having a problem with the header giving an error when you try to pass it. You probably shouldn't straight-up remove the header, so your best option might be to change the header to one that you know works. You can change the header (declaration) like so:

XmlDeclaration xmlDeclaration;
xmlDeclaration = yourDocument.CreateXmlDeclaration(
                                  yourVersion, 
                                  yourEncoding, 
                                  isStandalone);
yourDocument.ReplaceChild(xmlDeclaration, doc.FirstChild);
furkle
  • 5,019
  • 1
  • 15
  • 24
  • I don't know which tags may exist. I know I can loop after all nodes and use only the elements ones. But it should be a easier way to do that, like the XmlReaderSettings I've shown. – Iúri dos Anjos Oct 31 '14 at 18:41
  • @IúridosAnjos An easier way to do what? Even with your edit, I'm not 100% sure I understand. – furkle Oct 31 '14 at 18:44
  • I was hoping that some settings already exists. Like the "IgnoreComment" from XmlReaderSettings. – Iúri dos Anjos Oct 31 '14 at 18:48
  • @IúridosAnjos Check out what I just posted - I don't think it'll allow you to strip the header, nor should you, but you can at least modify it as you see fit. – furkle Oct 31 '14 at 18:49
  • thanks! I've just accepted my awnser as its looping throgh the main nodes and ignoring comments and declaration. It's more generic for my problems purpose. – Iúri dos Anjos Nov 03 '14 at 12:54