-1

So I'm trying to find all matching items in an chunk of HTML. Lets use the following as an example:

<body>
    <div>
    test
    <a href="test">
        <img src="test">
    </a>
    </div>
</body>

So we go looking for the word test and we find it first in the <div> and then in the <a> and then in the <img>.

What I'd like to be able to do is to retrieve the parent-level HTML without its attendant child-level HTML. Currently this is not attainable with InnerHtml/OuterHtml as both of those retrieve child-level markup. What I'm trying to work out is now to work through the tree of the DOM but only work at the parent-level: <div></div> then test then <a href="test"></a> and then <img src="test">.

I'm using HTML Agility Pack but am willing to consider anything that gives me the functionality I think I need.

bugmagnet
  • 7,631
  • 8
  • 69
  • 131
  • 1
    Hi bugmagnet. What you're looking for is called [Breadth-first search](https://en.wikipedia.org/wiki/Breadth-first_search) (BFS), instead of [Depth-first search](https://en.wikipedia.org/wiki/Depth-first_search) (DFS). [This answer](https://stackoverflow.com/a/39504040/5675325) might give some inspiration. – Tiago Martins Peres Oct 29 '19 at 09:33

1 Answers1

0

This is the solution I posted today on DEV. See the article for further discussion and an example.

private string OuterMinusInner(HtmlNode root)
{
    if (root == null)
        return string.Empty;

    foreach (var nodeFromList in
        (from node
         in root.ChildNodes 
         where node.NodeType != HtmlNodeType.Text 
         select node).ToList())
    {
        root.RemoveChild(nodeFromList);
    }

    return root.OuterHtml;
}

bugmagnet
  • 7,631
  • 8
  • 69
  • 131