1

I am trying to grab a data from a WEBPAGE , <DIV>particular class <DIV class="personal_info"> it has 10 similar <DIV>S and is of same Class "Personal_info" ( as shown in HTML Code and now i want to extract all the DIVs of Class personal_info which are in 10 - 15 in every webpage .

<div class="personal_info"><span class="bold">Rama Anand</span><br><br> Mobile: 9916184586<br>rama_asset@hotmail.com<br> Bangalore</div>

to do the needful i started using HTML AGILE PACK as suggested by some one in Stack overflow and i stuck at the beginning it self bcoz of lack of knowledge in HtmlAgilePack my C# code goes like this

HtmlAgilityPack.HtmlDocument docHtml = new HtmlAgilityPack.HtmlDocument();
        HtmlAgilityPack.HtmlWeb docHFile = new HtmlWeb();

        docHtml = docHFile.Load("http://127.0.0.1/2.html");

then how to code further so that data from DIV whose class is "personal_info" can be grabbed ... suggestion with example will be appreciated

panindra
  • 646
  • 2
  • 11
  • 33

3 Answers3

2

I can't check this right now, but isn't it:

var infos = from info in docHtml.DocumentNode.SelectNodes("//div[@class='personal_info']") select info; 
Carson63000
  • 4,215
  • 2
  • 24
  • 38
2

To get a url loaded you can do something like:

 var document = new HtmlAgilityPack.HtmlDocument(); 
 var url = "http://www.google.com";
 var request = (HttpWebRequest)WebRequest.Create(url);
 using (var responseStream =  request.GetResponse().GetResponseStream())
 {
   document.Load(responseStream, Encoding.UTF8);
 }

Also note there is a fork to let you use jquery selectors in agility pack.

IEnumerable<HtmlNode> myList = document.QuerySelectorAll(".personal_info");

http://yosi-havia.blogspot.com/2010/10/using-jquery-selectors-on-server-sidec.html

sclarson
  • 4,362
  • 3
  • 32
  • 44
  • +1 I didn't know there was a server side jQuery selector available, awesome. I traditionally used xpath with HtmlAgilityPack – Jason Jong Jul 01 '11 at 04:38
0

What happened to Where?

node.DescendantNodes().Where(node_it => node_it.Name=="div");

if you want top node (root) you use page.DocumentNode as "node".

greenoldman
  • 16,895
  • 26
  • 119
  • 185