4

I have continually had problems with Html Agility Pack; my XPath queries only ever work when they are extremely simple:

//*[@id='some_id']

or

//input

However, anytime they get more complicated, then Html Agility Pack can't handle it. Here's an example demonstrating the problem, I'm using WebDriver to navigate to Google, and return the page source, which is passed to Html Agility Pack, and both WebDriver and HtmlAgilityPack attempt to locate the element/node (C#):

//The XPath query
const string xpath = "//form//tr[1]/td[1]//input[@name='q']";

//Navigate to Google and get page source
var driver = new FirefoxDriver(new FirefoxProfile()) { Url = "http://www.google.com" };
Thread.Sleep(2000);

//Can WebDriver find it?
var e = driver.FindElementByXPath(xpath);
Console.WriteLine(e!=null ? "Webdriver success" : "Webdriver failure");

//Can Html Agility Pack find it?
var source = driver.PageSource;
var htmlDoc = new HtmlDocument { OptionFixNestedTags = true };
htmlDoc.LoadHtml(source);
var nodes = htmlDoc.DocumentNode.SelectNodes(xpath);
Console.WriteLine(nodes!=null ? "Html Agility Pack success" : "Html Agility Pack failure");

driver.Quit();

In this case, WebDriver successfully located the item, but Html Agility Pack did not.

I know, I know, in this case it's very easy to change the xpath to one that will work: //input[@name='q'], but that will only fix this specific example, which isn't the point, I need something that will exactly or at least closely mirror the behavior of WebDriver's xpath engine, or even the FirePath or FireFinder add-ons to Firefox.

If WebDriver can find it, then why can't Html Agility Pack find it too?

Anders
  • 15,227
  • 5
  • 32
  • 42
  • 1
    I have had good success with Html Agility Pack's XPath parser, so I'm wondering if perhaps the XPath is suboptimal. Here's an example of one that works for me in a production app: `.//div[@id=\"main\"]//div[@id=\"content\"]//div[@id=\"title\"]` – hemp May 25 '11 at 17:45
  • Unfortunately I'm not the one creating the XPath expressions most of the time; I help manage our custom WebDriver framework, so if someone in QA creates an XPath expression that works in WebDriver, it has to work in Html Agility Pack as well.. The example above was just to capture a common problem we've been having. – Anders May 25 '11 at 19:48

1 Answers1

9

The issue you're running into is with the FORM element. HTML Agility Pack handles that element differently - by default, it will never report that it has children.

In the particular example you gave, this query does find the target element:

.//div/div[2]/table/tr/td/table/tr/td/div/table/tr/td/div/div[2]/input

However, this does not, so it's clear the form element is tripping up the parser:

.//form/div/div[2]/table/tr/td/table/tr/td/div/table/tr/td/div/div[2]/input

That behavior is configurable, though. If you place this line prior to parsing the HTML, the form will give you child nodes:

HtmlNode.ElementsFlags.Remove("form");
hemp
  • 5,602
  • 29
  • 43
  • Brilliant! I bet that if I were to look through previous XPath expressions that had problems, I would find that the form node is the root cause of it. – Anders May 25 '11 at 19:30