using xpath to select an element after another

Question

I've seen similar questions, but the solutions I've seen won't work on the following. I'm far from an XPath expert. I just need to parse some HTML. How can I select the table that follows Header 2. I thought my solution below should work, but apparently not. Can anyone help me out here?

content = """<div>
<p><b>Header 1</b></p>
<p><b>Header 2</b><br></p>
<table>
<tr>
    <td>Something</td>
</tr>
</table>
</div>
"""

from lxml import etree
tree = etree.HTML(content)
tree.xpath("//table/following::p/b[text()='Header 2']")

score 18 · Answer 1 · answered Oct 09 '13 at 21:04

Some alternatives to @Arup's answer:

tree.xpath("//p[b='Header 2']/following-sibling::table[1]")

select the first table sibling following the p containing the b header containing "Header 2"

tree.xpath("//b[.='Header 2']/following::table[1]")

select the first table in document order after the b containing "Header 2"

See XPath 1.0 specifications for details on the different axes:

the following axis contains all nodes in the same document as the context node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes
the following-sibling axis contains all the following siblings of the context node; if the context node is an attribute node or namespace node, the following-sibling axis is empty

score 12 · Accepted Answer · edited Mar 17 '16 at 10:17

12

You need to use the below XPATH 1.0 using the Axes preceding.

 //table[preceding::p[1]/b[.='Header 2']]

edited Mar 17 '16 at 10:17

bman

5,016
4
36
69

answered Oct 09 '13 at 18:35

Arup Rakshit

116,827
30
260
317

Ah, ok. Thanks for the code and the docs link. That's helpful. – jseabold Oct 09 '13 at 18:46

using xpath to select an element after another

2 Answers2

Linked