32

I want to replace inner text of HTML tags with another text. I am using HtmlAgilityPack
I use this code to extract all texts

HtmlDocument doc = new HtmlDocument();
doc.Load("some path")

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']")) {
    // How to replace node.InnerText with some text ?
}

But InnerText is readonly. How can I replace texts with another text and save them to file ?

Shahin
  • 12,543
  • 39
  • 127
  • 205
  • Element's inner text is a combination of all children tags inner text. Do you want to replace all children tags with a text node? – Yuriy Rozhovetskiy Nov 25 '11 at 21:59
  • @YuriyRozhovetskiy I want to replace each element text with some text indeed I want to translate a website to another language. I want to extract all text from a page then translate , replace and save . – Shahin Nov 25 '11 at 22:02
  • 3
    It's odd that the XML documentation says that this property `Gets or Sets the text between the start and end tags of the object.` but then only provides a `get` method... – BrainSlugs83 Jun 12 '15 at 01:03

3 Answers3

23

Try code below. It select all nodes without children and filtered out script nodes. Maybe you need to add some additional filtering. In addition to your XPath expression this one also looking for leaf nodes and filter out text content of <script> tags.

var nodes = doc.DocumentNode.SelectNodes("//body//text()[(normalize-space(.) != '') and not(parent::script) and not(*)]");
foreach (HtmlNode htmlNode in nodes)
{
    htmlNode.ParentNode.ReplaceChild(HtmlTextNode.CreateNode(htmlNode.InnerText + "_translated"), htmlNode);
}
Yuriy Rozhovetskiy
  • 22,270
  • 4
  • 37
  • 68
  • Very Good thanks. how can I overwrite translated html to previous file ? I load Nodes from file/ – Shahin Nov 25 '11 at 23:15
  • If possible please just describe what's difference between my code XPath and yours ? – Shahin Nov 25 '11 at 23:28
  • Just for extra clarification that the correct way to set text to a node is by replacing the `HtmlTextNode` with the new one that's created with `HtmlTextNode.CreateNode("text here...")` – KFL Nov 22 '16 at 06:29
  • I know this is an old question but you saved my life here, thank you! – Paulo Hgo Dec 02 '17 at 00:31
16

Strange, but I found that InnerHtml isn't readonly. And when I tried to set it like that

aElement.InnerHtml = "sometext";

the value of InnerText also changed to "sometext"

lena
  • 1,181
  • 12
  • 36
  • 2
    But you run the chance of changing the html tags also – jnoreiga Sep 14 '12 at 12:23
  • 4
    InnerHtml is not readonly. InnerText is. The documentation seems wrong about InnerText being not readonly. – liang May 21 '13 at 09:54
  • 1
    While `InnerHtml` supports get/set, in certain situations it does not always appear to actually change the document content. If you set it, and then look at the document's `OuterHtml`, the content is not always changed. – Memetican Aug 15 '17 at 09:32
  • At moment of this comment (2021.01.04) it only supports get operations – mrbitzilla Jan 05 '21 at 01:42
5

The HtmlTextNode class has a Text property* which works perfectly for this purpose.

Here's an example:

var textNodes = doc.DocumentNode.SelectNodes("//body/text()").Cast<HtmlTextNode>();
foreach (var node in textNodes)
{
    node.Text = node.Text.Replace("foo", "bar");
}

And if we have an HtmlNode that we want to change its direct text, we can do something like the following:

HtmlNode node = //...
var textNode = (HtmlTextNode)node.SelectSingleNode("text()");
textNode.Text = "new text";

Or we can use node.SelectNodes("text()") in case it has more than one.


* Not to be confused with the readonly InnerText property.