4

How can I select every paragraph in a div tag for example.

<div id="body_text">
<p>Hi</p>
<p>Help Me Please</P>
<p>Thankyou</P>

I have got Html Agility downloaded and referenced in my program, All I need is the paragraphs. There may be a variable number of paragraphs and there are loads of different div tags but I only need the content within the body_text. Then I assume this can be stored as a string which I then want to write to a .txt file for later reference. Thankyou.

mintuz
  • 735
  • 1
  • 18
  • 40
  • 1
    Duplicate of http://stackoverflow.com/questions/2111332/select-all-ps-from-a-nodes-children-using-htmlagilitypack ? P.S. I don't know how to mark for duplicates..or maybe I don't have enough points? – Ozzy Jan 19 '11 at 16:27

2 Answers2

3

The valid XPATH for your case is //div[@id='body_text']/p

foreach(HtmlNode node in yourHTMLAgilityPackDocument.DocumentNode.SelectNodes("//div[@id='body_text']/p")
{
  string text = node.InnerText; //that's the text you are looking for
}
dlock
  • 9,447
  • 9
  • 47
  • 67
1

Here's a solution that grabs the paragraphs as an enumeration of HtmlNodes:

HtmlDocument doc = new HtmlDocument();
doc.Load("your.html");
var div = doc.GetElementbyId("body_text");
var paragraphs = div.ChildNodes.Where(item => item.Name == "p"); 

Without explicit Linq:

var paragraphs = doc.GetElementbyId("body_text").Elements("p");  
Corbin March
  • 25,526
  • 6
  • 73
  • 100