21
<html>
    <body>
        <div class="main">
            <div class="submain"><h2></h2><p></p><ul></ul>
            </div>
            <div class="submain"><h2></h2><p></p><ul></ul>
            </div>
        </div>
    </body>
</html>

I loaded the html into an HtmlDocument. Then I selected the XPath as submain. Then I dont know how to access to each tags i.e h2, p separately.

HtmlAgilityPack.HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class=\"submain\"]");
foreach (HtmlAgilityPack.HtmlNode node in nodes) {}

If I Use node.InnerText I get all the texts and InnerHtml is also not useful. How to select separate tags?

Val
  • 21,938
  • 10
  • 68
  • 86
Ajit
  • 309
  • 1
  • 3
  • 8

3 Answers3

44

The following will help:

HtmlAgilityPack.HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class=\"submain\"]");
foreach (HtmlAgilityPack.HtmlNode node in nodes) {
    //Do you say you want to access to <h2>, <p> here?
    //You can do:
    HtmlNode h2Node = node.SelectSingleNode("./h2"); //That will get the first <h2> node
    HtmlNode allH2Nodes= node.SelectNodes(".//h2"); //That will search in depth too

    //And you can also take a look at the children, without using XPath (like in a tree):        
    HtmlNode h2Node = node.ChildNodes["h2"];
}
Oscar Mederos
  • 29,016
  • 22
  • 84
  • 124
6

You are looking for Descendants

var firstSubmainNodeName = doc
   .DocumentNode
   .Descendants()
   .Where(n => n.Attributes["class"].Value == "submain")
   .First()
   .InnerText;
Radim Köhler
  • 122,561
  • 47
  • 239
  • 335
2

From memory, I believe that each Node has its own ChildNodes collection, so within your for…each block you should be able to inspect node.ChildNodes.

Zhaph - Ben Duguid
  • 26,785
  • 5
  • 80
  • 117
Jay
  • 56,361
  • 10
  • 99
  • 123