1

If you run the following Python code, you'll notice that it prints all of the tag references in the entire document, when it should only print 1.

How can you use xpath to first) search for article tags, and second) search for links within them?

from lxml import html

source = '''
<body>
    <a href='www.google.com'>outside 1</a>

    <article class='art'>
        <a href='www.google.com'>inside 1</a>
    </article>

    <article class='art'>
        <a href='www.google.com'>inside 2</a>
    </article>

    <a href='www.google.com'>outside 2</a>
</body>
'''

tree_html = html.fromstring(source)
articles = tree_html.xpath('//article')
first_articles_a_text = articles[0].xpath('//a')

print first_articles_a_text

Output:

[<Element a at 0x47b05e8>, <Element a at 0x47b0598>, <Element a at 0x47b07c8>, <Element a at 0x47b0818>]

Note : I could not find a similar answer anywhere on SO or online. Forgive me if I missed one.

Josh.F
  • 3,666
  • 2
  • 27
  • 37

1 Answers1

1

Start your xpath expression with a dot. This would make it search in the scope of the element:

first_articles_a_text = articles[0].xpath('.//a')

See also:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • So each of the elements in the list [articles] really references the entire tree? (And thanks so much, this was a couple hour nightmare for me) – Josh.F Aug 27 '14 at 02:22