30

I want to replace a node with a new node. How can I get the exact position of the node and do a complete replace?

I've tried the following, but I can't figured out how to get the index of the node or which parent node to call ReplaceChild() on.

string html = "<b>bold_one</b><strong>strong</strong><b>bold_two</b>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);

var bolds = document.DocumentNode.Descendants().Where(item => item.Name == "b");

foreach (var item in bolds)
{

    string newNodeHtml = GenerateNewNodeHtml();
    HtmlNode newNode = new HtmlNode(HtmlNodeType.Text, document, ?);
    item.ParentNode.ReplaceChild( )
}
Omar
  • 39,496
  • 45
  • 145
  • 213

2 Answers2

65

To create a new node, use the HtmlNode.CreateNode() factory method, do not use the constructor directly.

This code should work out for you:

var htmlStr = "<b>bold_one</b><strong>strong</strong><b>bold_two</b>";
var doc = new HtmlDocument();
doc.LoadHtml(htmlStr);

var query = doc.DocumentNode.Descendants("b");
foreach (var item in query.ToList())
{
    var newNodeStr = "<foo>bar</foo>";
    var newNode = HtmlNode.CreateNode(newNodeStr);
    item.ParentNode.ReplaceChild(newNode, item);
}

Note that we need to call ToList() on the query, we will be modifying the document so it would fail if we don't.


If you wish to replace with this string:

"some text <b>node</b> <strong>another node</strong>"

The problem is that it is no longer a single node but a series of nodes. You can parse it fine using HtmlNode.CreateNode() but in the end, you're only referencing the first node of the sequence. You would need to replace using the parent node.

var htmlStr = "<b>bold_one</b><strong>strong</strong><b>bold_two</b>";
var doc = new HtmlDocument();
doc.LoadHtml(htmlStr);

var query = doc.DocumentNode.Descendants("b");
foreach (var item in query.ToList())
{
    var newNodesStr = "some text <b>node</b> <strong>another node</strong>";
    var newHeadNode = HtmlNode.CreateNode(newNodesStr);
    item.ParentNode.ReplaceChild(newHeadNode.ParentNode, item);
}
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • Is there a way to replace a node with multiple other nodes? Example, if `newNodeStr='some text node another node'`, the replace doesn't work. – Omar Jul 22 '11 at 15:43
  • @Omar: Updated. If you parse that string using `HtmlNode.CreateNode()`, it will result in creating a reference to the first node. So if you replaced with that, you'd only see the first one being replaced. You should actually be replacing the `ParentNode` to grab all of them. – Jeff Mercado Jul 22 '11 at 19:24
  • After thinking about it, it might be safe to just always use `ParentNode` since a new, single node's parent is effectively itself when doing replacements. – Jeff Mercado Jul 23 '11 at 02:48
  • this wont work if item has multiple tags eg. test – Jason Dias Feb 11 '14 at 21:28
  • @Jason: Sure it does... did you actually _try_ it? – Jeff Mercado Feb 11 '14 at 21:33
  • @JeffMercado Yes, but I think its just my code as crazy as it sounds I am getting different result while debugging anyway thanks for your quick response – Jason Dias Feb 12 '14 at 15:09
  • "_Note that we need to call `ToList()` on the query, we will be modifying the document so it would fail if we don't._" Statement was the very useful part for me. Thanks. – Ahmed Mostafa Jun 17 '15 at 11:16
  • I also find that CreateNode does not create sub-nodes - If I have "blah blah

    foo bar

    " then it truncates it to "blah blah" with no nested p tag. Same with "blah blah
    " - leaves out the
    . I had to swap out my code with HtmlDocument.LoadHtml. But I'm sure it used to work - maybe a debug/compile quirk.
    – Etherman Nov 28 '16 at 14:58
  • @Etherman: As stated in the answer, `CreateNode()` will return _a_ node. That is a _series_ of nodes (a text node followed by an HTML node). Using `CreateNode()`, you'll obtain a reference to the first node, but you'll need to use the parent node to access the rest. – Jeff Mercado Nov 28 '16 at 15:45
  • Ah, I get what you mean...I just assumed it would return the topmost root node of whatever it created, not something inside it...never even occurred to me to start looking UP the tree...seems an odd design choice to me! – Etherman Nov 28 '16 at 17:31
  • @JeffMercado I have a problem replacing child tags with the text. I need to replace

    with double new line and

    with single. "

    Das geilste Rasierwasser ever.
    ". `htmlDocument.DocumentNode.Descendants().Where(n => n.Name != "#text" && n.Name != "#document").ToList()` returns two elements. After the

    is replaced, on the second iteraion the item.ParentNode is null. Do you know how I can handle it? Thank you

    – VladL Feb 08 '17 at 11:06
  • @JeffMercado The function to replace for

    . Div Handler does the same besides 1 NL `private static void SurroundWithDoubleLineBreak(HtmlNode node) { var text = Environment.NewLine + Environment.NewLine + node.InnerHtml + Environment.NewLine + Environment.NewLine; node.ParentNode.ReplaceChild(HtmlNode.CreateNode(text).ParentNode, node); }`

    – VladL Feb 08 '17 at 11:08
  • Still helpful 10 years later and worked perfectly for me. +1 – Yogi Jan 14 '22 at 15:11
-1

Have Implemented the following solution to achieve the same.

var htmlStr = "<b>bold_one</b><div class='LatestLayout'><div class='olddiv'><strong>strong</strong></div></div><b>bold_two</b>";
var htmlDoc = new HtmlDocument();
    HtmlDocument document = new HtmlDocument();
    document.Load(htmlStr);

htmlDoc.DocumentNode.SelectSingleNode("//div[@class='olddiv']").Remove();
htmlDoc.DocumentNode.SelectSingleNode("//div[@class='LatestLayout']").PrependChild(newChild)

htmlDoc.Save(FilePath); // FilePath .html file with full path if need to save file.

so selecting an object and removing respective HTML object

and appending it as chile. of respective object.

BJ Patel
  • 6,148
  • 11
  • 47
  • 81