0

i fetch all the detail from the desire website but unable to get the some specific information please guide me for that.

targeted domain: https://shop.adidas.ae/en/messi-16-3-indoor-boots/BA9855.html

my code isresponse.xpath('//ul[@class="product-size"]//li/text()').extract()

enter image description here

need to fetch data!!!

enter image description here
Thanks!

Zia
  • 394
  • 1
  • 3
  • 13

2 Answers2

2

Often ecommerce websites have data in json format in page source and then have javscript unpack it on users end.

In this case you can open up the page source with javascript disabled and search for keywords (like specific size).

I found in this case it can be found with regular expressions:

import re
import json
data = re.findall('window.assets.sizesMap = (\{.+?\});', response.body_as_unicode())
json.loads(data[0])
Out: 
{'16': {'uk': '0k', 'us': '0.5'},
 '17': {'uk': '1k', 'us': '1'},
 '18': {'uk': '2k', 'us': '2.5'},
 ...}

Edit: More accurately you probably want to get different part of the json but nevertheless the answer is more or less the same:

data = re.findall('window.assets.sizes = (\{(?:.|\n)+?\});', response.body_as_unicode())
json.loads(data[0].replace("'", '"')) # replace single quotes to doubles
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
  • i am getting an error while applying your code. `IndexError Traceback (most recent call last) in () 1 data = re.findall('window.assets.sizes = (\{.+?\});', response.body_as_unicode()) ----> 2 json.loads(data[0].replace("'", '"')) IndexError: list index out of range` – Zia Jun 06 '17 at 10:12
  • Seems like your regular expression is different. Mine is `'window.assets.sizes = (\{(?:.|\n)+?\});'` vs your's `'window.assets.sizes = (\{.+?\});'`. – Granitosaurus Jun 07 '17 at 05:45
  • sir @Granitosaurus can you help me out for splash, how i will install on my machine to accomplish this task.i tried but could not accomplish it. – Zia Jun 07 '17 at 06:12
  • @MuhammadZiaUrRahman You should open up a new question describing what issues you are facing with your splash installation. I'm not sure splash even supports windows platform as there are no install instructions for it in the [docs](http://splash.readthedocs.io/en/stable/install.html). Either way as my example showed splash is unnecessary in your case. – Granitosaurus Jun 07 '17 at 07:23
  • sir in your way how i can i get the only 'eu' values from the object. please guide me thanks – Zia Jun 07 '17 at 07:27
1

The data you want to fetch is loaded from a javascript. It is said explicitly in the tag class="js-size-value ".

If you want to get it, you will need to use a rendering service. I suggest you use Splash, it is simple to install and simple to use. You will need docker to install splash.

Adrien Blanquer
  • 2,041
  • 1
  • 19
  • 31
  • While it's a great lazy option it's far from being neccessary. You can replicated javascript with python in scrapy by digging around a bit. – Granitosaurus Jun 06 '17 at 09:35
  • Yes I agree. I'll dig more before using splash everywhere.Thanks for the tip :-) – Adrien Blanquer Jun 06 '17 at 09:41
  • @Blanquer Adrien means possible it can be possible with scrapy to get the javascript base data. – Zia Jun 06 '17 at 09:45
  • @BLANQUERAdrien please can you give me splash code example to get the data. i am trying but could not get the result. thanks – Zia Jun 06 '17 at 10:27
  • @BLANQUERAdrien Docker is not running on my machine Windows10 please can you help me out to accomplish this task sir. thanks – Zia Jun 07 '17 at 06:06
  • @Muhammad Zia Ur Rahman Sorry for the delay, you should install `docker` following this [tutorial](https://docs.docker.com/docker-for-windows/install/) if you are working on windows. Once you installed docker check the [splash documentation](http://splash.readthedocs.io/en/latest/install.html#installation) and install it. Let me know :-) – Adrien Blanquer Jun 07 '17 at 07:27
  • Thanks for your kind reply. i install the docker tool box and further i am unable to run it. please guide me further thanks – Zia Jun 07 '17 at 07:29
  • I'm afraid it will be complicated to debug your docker installation here, I suggest you make another post for this particular problem. – Adrien Blanquer Jun 07 '17 at 07:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/146029/discussion-between-muhammad-zia-ur-rahman-and-blanquer-adrien). – Zia Jun 07 '17 at 07:32