0

Please help me scrape product names from this link: http://www.gap.com/browse/category.do?cid=5168&scrollTo=product353401012&scrollTo=product353401012#pageId=0&department=75

The product names are contained in class="product-card--name" which is in a div.When I run :response.css('div.product-card--name::text').extract() ,it returns an empty list.

Please provide both css and xpath commands.

Light
  • 143
  • 1
  • 6
  • The content come in with AJAX requests. So you will to use the url of those. (*it is http://www.gap.com/resources/productSearch/v1/search?cid=5168 and is in JSON format*) – Gabriele Petrioli Dec 05 '16 at 19:34

1 Answers1

0

As said by Gaby, the contents are dynamically loaded. You can see this by:

  • Opening the website to scrap in chrome (firefox also has a way of doing it)
  • Press F12 to open DevTools
  • Select the 'Network' tab
  • Select 'XHR' as filter
  • Make a search (or reload the website)

XHR filter

You are going to see a list of items, the one you want is:

search?cid=5168&isFacetsEnabled=true&globalShippingCountryCode=&globalShippingCurrencyCode=&locale=en_US&pageId=0

and if you click on it you can see the http request with the headers and response with all the data you want.

To do this on scrapy it's a bit more complex, you have to scrap this link but using the "POST" method instead of the default ("GET"). To do this from a scrapy spider:

yield scrapy.Request(url, self.parse_data, method="POST", headers=headers, body=body)

Where the URL should be the one you found on the XHR filter, the method used is "POST", you should copy the headers we found earlier and in the body goes all the parameters specific to what you are searching for. From that you are gonna get a JSON response which you can save to a file or do whatever you want.

If you need more details let me know.

vmmc
  • 1
  • 1
  • Can you please help me with the complete code for getting the product names. I am new to scrapy thing.And why can't the product names be scraped even though it is in HTML. [Click on this](http://i.imgur.com/FqEwhJg.png). – Light Dec 06 '16 at 06:54