0

I'm trying to scrape some text of a website that has a list of products. What is the XPath to get the text of only the first occurrence of a class tag in each div? In the code below, I need the first occurence of the text of span "bar" for each div "foo".

So I need the XPath that gives me only "Year A", "Year C", etc.

I'm new with this and have no clue to do this. Many thanks for any help offered!

<div class="foo">                       
    <span class="bar">year A</span>
    <span class="qux">some text</span>
    <span class="bar">year B</span>
</div>

<div class="foo">                       
    <span class="bar">year C</span>
    <span class="qux">some text</span>
    <span class="bar">year D</span>
</div>

Etc.

With something like //span[@class='bar'][1]/text() one would only get "Year A".

With something like //*[contains(@class, 'bar')]/text() one would get "Year A", "Year B", "Year C" and "Year D".

I'm scraping multiple pages and the number of items on each page is different. The class name "bar" is only used for the elements I need, so the problem described here: What is the XPath expression to find only the first occurrence? does not apply.

Community
  • 1
  • 1
Bob-78
  • 11
  • 1
  • 7

2 Answers2

0

This one worked fine in XPath tester :

//div[@class='foo']/span[@class='bar'][1]/text()

or without text() if you don't really need it :

//div[@class='foo']/span[@class='bar'][1]
har07
  • 88,338
  • 12
  • 84
  • 137
  • Great, this seems to work well for the provided example. The accepted answer offers more precision when the code gets more complicated. – Bob-78 Aug 04 '14 at 11:08
0

With //div[@class = 'foo']/span[@class = 'bar'][1] you would select each first child span with attribute class being bar. If the class or name of the parent does not matter then use //*/span[@class = 'bar'][1].

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110