XPath to locate a cell with specific text parsing HTML tables

Question

Hope someone out there can quickly point me in the right direction with my XPath difficulties.

Current I've got to the point where I'm identifying the correct table i need in my HTML source but then I need to process only the rows that have the text 'Chapter' somewhere in the DOM.

My last attempt was to do this :

// get the correct table
HtmlTable table = page.getFirstByXPath("//table[2]");

// now the failing bit....
def rows = table.getByXPath("*/td[contains(text(),'Chapter')]")

I thought the xpath above would represent, get me all elements that have a following child element of 'td' that somewhere in its dom contains the text 'Chapter'

An example of a matching row from my source is :

<tr valign="top">
  <td nowrap="" align="Right">
   <font face="Verdana">
   <a href="index.cfm?a=1">Chapter 1</a>
   </font>
  </td>
  <td class="ChapterT">
    <font face="Verdana">DEFINITIONS</font>
  </td>
  <td>&nbsp;</td>
</tr>

Any help / pointers greatly appreciated.

Thanks,

score 20 · Accepted Answer · answered Mar 10 '12 at 06:16

20

Use this XPath:

//td[contains(., 'Chapter')]

answered Mar 10 '12 at 06:16

Kirill Polishchuk

54,804
11
122
125

Thanks, that appears to work. What does the '.' represent? Also I don't understand why the 'reletive' detection isn't working, e.g. you have the // which as I understand means begin at the root? – David Brown Mar 10 '12 at 13:17
1

@Dave, You're welcome. `.` and `//` is XPath abbreviated syntax. `.` selects the context node. `//td` selects all the `td` descendants of the document root and thus selects all `td` elements in the same document as the context node. *Reference*: http://www.w3.org/TR/xpath/#path-abbrev – Kirill Polishchuk Mar 10 '12 at 16:24

score 9 · Answer 2 · answered Mar 10 '12 at 15:42

You want all tds under your current node -- not - all in the document as the currently accepted answer selects.

Use:

.//td[.//text()[contains(., 'Chapter')]]

This selects all td descendants of the current node that are named td that have at least one text node descendant, whose string value contains the string "Chapter".

If it is known in advance that any td under this table only has a single text node, this can be simplified to just:

.//td[contains(., 'Chapter')]

William Walseth · Answer 3 · 2012-03-10T04:22:39.567

2

Your on the right "path".
The contains() function is limited the a specific element, not text in any of the children. Try this XPath, which you could read as follows: - get every tr/td with any sub element that contains the text 'Chapter'

tr/td[contains(*,"Chapter")]

Good luck

edited Mar 10 '12 at 04:22

answered Mar 10 '12 at 03:58

William Walseth

2,803
1
23
25

Hi William, gave it a go but couldn't get it to return anything. What has worked, although doesn't seem the most efficient is a single liner of ' def chapterAnchors = page.anchors.findAll {HtmlAnchor a -> a.asText().contains('Chapter')} ' – David Brown Mar 10 '12 at 04:29

XPath to locate a cell with specific text parsing HTML tables

3 Answers3