I am trying to access historical data of this page from 01/01/2018 date in scrapy shell.
After analysis,I figured out that the form data of request is like this
In [124]: form
Out[124]:
{'action': 'historical_data',
'curr_id': '44765',
'end_date': '07/04/2020',
'header': 'YANG+Historical+Data',
'interval_sec': 'Daily',
'smlID': '2520420',
'sort_col': 'date',
'sort_ord': 'DESC',
'st_date': '01/01/2018'}
And request url and headers are like this
In [125]: url
Out[125]: 'https://www.investing.com/instruments/HistoricalDataAjax'
In [126]: head
Out[126]:
({'name': 'Accept', 'value': 'text/plain, */*; q=0.01'},
{'name': 'Accept-Encoding', 'value': 'gzip, deflate, br'},
{'name': 'Accept-Language', 'value': 'en-US,en;q=0.5'},
{'name': 'Cache-Control', 'value': 'no-cache'},
{'name': 'Connection', 'value': 'keep-alive'},
{'name': 'Content-Length', 'value': '172'},
{'name': 'Content-Type', 'value': 'application/x-www-form-urlencoded'},
{'name': 'Host', 'value': 'www.investing.com'},
{'name': 'Origin', 'value': 'https://www.investing.com'},
{'name': 'Pragma', 'value': 'no-cache'},
{'name': 'User-Agent',
'value': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'},
{'name': 'X-Requested-With', 'value': 'XMLHttpRequest'})
But when I make request,it is redirecting to the home page of the website
In [127]: fetch(scrapy.FormRequest(url,method='POST',headers=head, formdata =for
...: m))
2020-07-04 12:39:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.investing.com/> from <POST https://www.investing.com/instruments/HistoricalDataAjax>
2020-07-04 12:39:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.investing.com/> (referer: None)
Update:
This header is working fine in developer console and returning correct response but in shell getting 400 error
In [13]: header
Out[13]:
{'Accept': 'text/plain, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.5',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Length': '172',
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'www.investing.com',
'Origin': 'https://www.investing.com',
'Pragma': 'no-cache',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0',
'X-Requested-With': 'XMLHttpRequest'}
I know that I am making mistake somewhere but can't figure out where it is.
I searched a lot, tried various ways like from_request()
, Request(url,method='POST', headers=head, body=payload)
and posting here was the least choice.