2

I traverse an html document with SGML and XmlDocument. When I find an XmlNode which its type is Text, I need to change its value that has an xml element. I can't change InnerXml because it's readonly. I tried to change InnerText, but this time tag descriptor chars < and > encoded to &lt; and &gt;. for example:

<p>
    This is a text that will be highlighted.
    <anothertag />
    <......>
</p>

I'm trying to change to:

<p>
    This is a text that will be <span class="highlighted">highlighted</span>.
    <anothertag />
    <......>
</p>

What is the easiest way to modify the value of a text XmlNode?

oruchreis
  • 866
  • 2
  • 12
  • 28

3 Answers3

2

I have a workaround, I don't know it is a real solution or what, but it can result what I want. Please comment for this code if it is worthy solution or not

    private void traverse(ref XmlNode node)
    {
        XmlNode prevOldElement = null;
        XmlNode prevNewElement = null;
        var element = node.FirstChild;
        do
        {
            if (prevNewElement != null && prevOldElement != null)
            {
                prevOldElement.ParentNode.ReplaceChild(prevNewElement, prevOldElement);
                prevNewElement = null;
                prevOldElement = null;
            }
            if (element.NodeType == XmlNodeType.Text)
            {
                var el = doc.CreateElement("text");
                //Here is manuplation of the InnerXml.
                el.InnerXml = element.Value.Replace(a_search_term, "<b>" + a_search_term + "</b>");
                //I don't replace element right now, because element.NextSibling will be null.
                //So I replace the new element after getting the next sibling.
                prevNewElement = el;
                prevOldElement = element;
            }
            else if (element.HasChildNodes)
                traverse(ref element);
        }
        while ((element = element.NextSibling) != null);
        if (prevNewElement != null && prevOldElement != null)
        {
            prevOldElement.ParentNode.ReplaceChild(prevNewElement, prevOldElement);
        }

    }

Also, I remove <text> and </text> strings after the traverse function:

        doc = new XmlDocument();
        doc.PreserveWhitespace = true;
        doc.XmlResolver = null;
        doc.Load(sgmlReader);
        var html = doc.FirstChild;
        traverse(ref html);
        textBox1.Text = doc.OuterXml.Replace("<text>", String.Empty).Replace("</text>", String.Empty);
oruchreis
  • 866
  • 2
  • 12
  • 28
1
using System;
using System.Xml;

public class Sample {

  public static void Main() {
    XmlDocument doc = new XmlDocument();
    doc.LoadXml(
    "<p>" +
    "This is a text that will be highlighted." +
    "<br />" +
    "<img />" +
    "</p>");
    string ImpossibleMark = "_*_";
    XmlNode elem = doc.DocumentElement.FirstChild;
    string thewWord ="highlighted";
    if(elem.NodeType == XmlNodeType.Text){
        string OriginalXml = elem.ParentNode.InnerXml;
        while(OriginalXml.Contains(ImpossibleMark)) ImpossibleMark += ImpossibleMark;
        elem.InnerText = elem.InnerText.Replace(thewWord, ImpossibleMark);
        string replaceString = "<span class=\"highlighted\">" + thewWord + "</span>";
        elem.ParentNode.InnerXml = elem.ParentNode.InnerXml.Replace(ImpossibleMark, replaceString);
    }

    Console.WriteLine(doc.DocumentElement.InnerXml);
  }
}
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70
  • If InnerXML has a tag named "highlighted" or if the search term is a tag name like span, your solution doesn't work. Also actually I don't replace directly the text. I split the `text` into its words. I have a library that gives me the stem of the word, and I traverse every word in the `text` and do the stem job. So I need work on only text nodes. Thanks anyway. – oruchreis Oct 26 '11 at 16:44
  • As is said for sure. So, I have a little modification. I think this is sufficient in most cases. – BLUEPIXY Oct 26 '11 at 18:48
  • Strictly speaking, may be useless, I think your way in such cases. Simplified method is better than I probably would not. – BLUEPIXY Oct 26 '11 at 18:57
0

The InnerText property will give you the text content of all the child nodes of the XmlNode. What you really want to set is the InnerXml property, which will be construed as XML, not as text.

casperOne
  • 73,706
  • 19
  • 184
  • 253
  • Yes, thanks but if the XmlNode's type is `Text`, the InnerXml property is readonly. I need another solution. – oruchreis Oct 26 '11 at 13:17