2

I've been trying to scrapy the following Website but with the currency changed to 'SAR' from the upper left settings form , i tried sending a scrapy request like this:

r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency',
                                        'value': 'SAR',
                                        'domain': '.www.mooda.com',
                                        'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True)

and i still get the price as EG

In [10]: response.css('.price').xpath('text()').extract()
Out[10]: 
[u'1,957 EG\xa3',
 u'3,736 EG\xa3',
 u'2,802 EG\xa3',
 u'10,380 EG\xa3',
 u'1,823 EG\xa3']

i have also tried to send a post request with the Specified form data like this :

from scrapy.http.request.form import FormRequest
url = 'https://www.mooda.com/en/'
r = FormRequest(url=url,formdata={'selectCurrency':'https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuLw,,/'})
fetch(r)

still it would never work ,also tried to use FormRequest.from_response() but it would never work , id really like some advices ,im new to scrapy form requests , if anyone could help , i'd be thankful

M.nabil.H
  • 23
  • 4

1 Answers1

0

It is all about the frontend cookie, I will show you how to do it with requests first, the logic will be exactly the same with Scrapy:

head = {        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#
import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
    r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
    r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
    soup2 = BeautifulSoup(r.content)
    print(soup2.select_one(".price").text)

You need to make a requests to the url under the option with the id selectCurrency, you then pass the cookies returned when you make your request to https://www.mooda.com/en?currency=sar. There are no posts, it is all get requests but the frontend cookie from the get is essential.

If we run the code, you see it does give us the correct data:

In [9]: with requests.Session() as s:
   ...:         soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
   ...:         r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
   ...:         r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
   ...:         soup2 = BeautifulSoup(r.content,"lxml")
   ...:         print(soup2.select_one(".price").text)
   ...:     

825 SR

using scrapy:

class S(Spider):
    name = "foo"
    allowed_domains = ["www.mooda.com"]
    start_urls = ["https://www.mooda.com/en"]

    def parse(self, resp):
        curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
        return Request(curr, callback=self.parse2)

    def parse2(self, resp):
        print( resp.headers.getlist('Set-Cookie'))
        return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)

    def parse3(self, resp):
        print(resp.css('.price').xpath('text()').extract())

Which if you run will give you:

['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']


[u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']

The get to curr returns nothing, it just sets the cookie

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • Thank you very much that was really helpful , i didn't know that its as simple as requesting url='https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuL3Nob2VzL2JhbGxlcmluYXM,/' , then using its cookies to request the website's URL , thanks for your effort. – M.nabil.H Jun 28 '16 at 13:51
  • No worries, not actually the most obvious solution, only simple after you figure out how ;) – Padraic Cunningham Jun 28 '16 at 20:41