Getting a 403 error using Scrapy Request

Question

When I use the Python requests module for the following HTTP request, it returns a dict of exactly what I need:

import requests

payload = {'x-algolia-application-id':'Q0TMLOPF1J','x-algolia-api-key':'30a0c84a152d179ea8aa1a7a59374d08', 'hitsPerPage':'40', 'numericFilters': ['startdate > 1511095966851'],'facets': '*' }  

url = 'https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events'

r = requests.get(url, params=payload).json()

However when I instead try to implement this as a scrapy Request so I can parse the results:

def start_requests(self):
    payload = {'x-algolia-application-id':'Q0TMLOPF1J','x-algolia-api-key':'30a0c84a152d179ea8aa1a7a59374d08', 'hitsPerPage':'40', 'numericFilters': ['startdate > 1511095966851'],'facets': '*' }  

    url = 'https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events'

    yield scrapy.Request(url,
                                   body=json.dumps(payload), 
                                   method='GET',
                                   callback=self.parse_item)

def parse_item(self,response):
    # I want to parse the dict here

I get a 403 error. I know there is something simple I am doing wrong, what is it?

Specifically: the site API returns an error that the API key or application ID is invalid, which is not the case, since the same credentials work using requests. — NFB, Jan 09 '18 at 21:35
It is : https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events/x-algolia-application-id=Q0TMLOPF1J&x-algolia-api-key=30a0c84a152d179ea8aa1a7a59374d08&hitsPerPage=40&facets=*, which is what returns the errors mentioned above. — NFB, Jan 09 '18 at 21:40
you need to add `?` instead of `/` after sitescore-events , change to `https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events?x-algolia-application-id=Q0TMLOPF1J&x-algolia-api-key=30a0c84a152d179ea8aa1a7a59374d08&hitsPerPage=40&facets=` — Stack, Jan 09 '18 at 21:43
compare it with `requests` url , you can access that url by `r.url` — Stack, Jan 09 '18 at 21:46
This brought an interesting discovery: the dict element "'numericFilters': ['startdate > 1511095966851']" was not correctly converted to URL by Scrapy, which was the problem. Requests was stripping it out altogether. I removed it manually from the Scrapy request, since it was an inessential filter, and it works. — NFB, Jan 09 '18 at 22:17

score -1 · Answer 1 · answered Jan 09 '18 at 22:45

-1

I know you've "solved" the problem by leaving out a parameter, but the correct way to do this would be using a FormRequest

yield scrapy.FormRequest(
    url=url,
    method='GET',
    formdata=payload,
    callback=self.parse_item
)

answered Jan 09 '18 at 22:45

stranac

26,638
5
25
30

Do you mean this would correctly parse the omitted parameter? It does not; it still returns the 403 error. It does work without it, however. – NFB Jan 09 '18 at 22:52
Works perfectly for me. I get the same response as with requests. – stranac Jan 09 '18 at 22:55

Getting a 403 error using Scrapy Request

1 Answers1