1

I am scraping this page to get data of each Ad: http://www.cars2buy.co.uk/business-car-leasing/Abarth/695C/?

Here is my code in scrapy shell:

scrapy shell "http://www.cars2buy.co.uk/business-car-leasing/Abarth/695C/"
for content in response.xpath('//*[@class="pitem"]/div[1]/div[2]/div[1]'):
          print content.xpath('//*[@class="detail"]/p/text()[2]').extract()

but it only extract 48 in each iteration!! the disered output should be:

48 months

48 months

48 months

36 months

48 months

48 months

48 months

48 months

48 months

36 months

according to the ads on the page! Any suggestions?

Hat hout
  • 471
  • 1
  • 9
  • 18

1 Answers1

1

Easy fix. Try adding a . to the front of the second xpath:

print content.xpath('.//*[@class="detail"]/p/text()[2]').extract()

Explanation:

An xpath that starts with / means 'start searching at the document root' while an xpath that starts with . means 'start searching in the current position' ... so it's very much like navigating directories of a filesystem.

So without the . your xpath expression extracted all matching elements that were anywhere on the page ... and did so in each iteration.

Update/Addition

This also happens when the xpath expression is used on a sub-element ('selector' in scrapy lingo) like content in this example.

Scrapy internally keeps the whole html and starts from the document root when the xpath starts with /. Explained in detail here: https://doc.scrapy.org/en/latest/topics/selectors.html#working-with-relative-xpaths

Done Data Solutions
  • 2,156
  • 19
  • 32
  • thank you for your answer, but i though that by using content.xpath not response.xpath, it will just search in the content not in all the page. – Hat hout Apr 29 '17 at 19:45
  • yes, that's what I thought, too when I started using scrapy several years ago. And so I fell for the same weird trap. `content.xpath` can still access the whole document's html and so does your xpath expression start searching from the root. – Done Data Solutions Apr 29 '17 at 19:49
  • thank you for sharing this. it's really what i need. – Hat hout Apr 29 '17 at 20:30