1

I have the following HTML that I'm trying to parse using the HTML Agility Pack.

This is a snippet of HTML code:

<body id="station_page" class="">
...
<div>....</div>
<script type="text/javascript"> 
if (Blablabla == undefined) { var Blablabla = {}; }
Blablabla .Data1= "I want this data";
Blablabla .BlablablaData = 
{  "Data2":"I want this data",
"Blablabla":"",
"Blablabla":0   }
{   "Blablabla":123,
"Data3":"I want this data",
"Blablabla":123}
    Blablabla .Data4= I want this data;
</script>...

I'm tring to get those 4 data variable (Data1,Data2,Data3,Data4). first, I tried to found the javascript:

doc.DocumentNode.SelectSingleNode("//script[@type='text/javascript']").InnerHtml

How can I check if it's really the right javascript? After finding the relevant javascript how can I get those 4 data variable (Data1,Data2,Data3,Data4)?

  • I think this is the wrong way of doing it. Not sure what's the right way, but this (using htmlagilitypack) isn't it. – Th0rndike Mar 08 '13 at 14:47
  • Sounds like you need to execute the javascript, not just to parse it? If so then here's one way to do it: http://stackoverflow.com/questions/2530789/evaluate-javascript-to-plain-text-using-c-net-3-5/9415417#9415417 – Dmitriy Khaykin Mar 08 '13 at 14:53

1 Answers1

4

You can't parse javascript with HTML Agility Pack, it only supports HTML parsing. You can get to the script you need with an XPATH like this:

doc.DocumentNode.SelectSingleNode("//script[contains(text(), 'Blablabla')]").InnerHtml

But you'll need to parse the javascript with another method (regex, js grammar, etc.)

Simon Mourier
  • 132,049
  • 21
  • 248
  • 298