If you run the following Python code, you'll notice that it prints all of the tag references in the entire document, when it should only print 1.
How can you use xpath to first) search for article tags, and second) search for links within them?
from lxml import html
source = '''
<body>
<a href='www.google.com'>outside 1</a>
<article class='art'>
<a href='www.google.com'>inside 1</a>
</article>
<article class='art'>
<a href='www.google.com'>inside 2</a>
</article>
<a href='www.google.com'>outside 2</a>
</body>
'''
tree_html = html.fromstring(source)
articles = tree_html.xpath('//article')
first_articles_a_text = articles[0].xpath('//a')
print first_articles_a_text
Output:
[<Element a at 0x47b05e8>, <Element a at 0x47b0598>, <Element a at 0x47b07c8>, <Element a at 0x47b0818>]
Note : I could not find a similar answer anywhere on SO or online. Forgive me if I missed one.