0

When I use the Python requests module for the following HTTP request, it returns a dict of exactly what I need:

import requests

payload = {'x-algolia-application-id':'Q0TMLOPF1J','x-algolia-api-key':'30a0c84a152d179ea8aa1a7a59374d08', 'hitsPerPage':'40', 'numericFilters': ['startdate > 1511095966851'],'facets': '*' }  

url = 'https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events'

r = requests.get(url, params=payload).json()

However when I instead try to implement this as a scrapy Request so I can parse the results:

def start_requests(self):
    payload = {'x-algolia-application-id':'Q0TMLOPF1J','x-algolia-api-key':'30a0c84a152d179ea8aa1a7a59374d08', 'hitsPerPage':'40', 'numericFilters': ['startdate > 1511095966851'],'facets': '*' }  

    url = 'https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events'

    yield scrapy.Request(url,
                                   body=json.dumps(payload), 
                                   method='GET',
                                   callback=self.parse_item)

def parse_item(self,response):
    # I want to parse the dict here

I get a 403 error. I know there is something simple I am doing wrong, what is it?

NFB
  • 642
  • 8
  • 26
  • https://stackoverflow.com/a/33747209/8150371 – Stack Jan 09 '18 at 21:27
  • Yeah, I've tried this. It still gives a 403. – NFB Jan 09 '18 at 21:29
  • Specifically: the site API returns an error that the API key or application ID is invalid, which is not the case, since the same credentials work using requests. – NFB Jan 09 '18 at 21:35
  • check the url by printing after it is encoded – Stack Jan 09 '18 at 21:36
  • It is : https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events/x-algolia-application-id=Q0TMLOPF1J&x-algolia-api-key=30a0c84a152d179ea8aa1a7a59374d08&hitsPerPage=40&facets=*, which is what returns the errors mentioned above. – NFB Jan 09 '18 at 21:40
  • you need to add `?` instead of `/` after sitescore-events , change to `https://q0tmlopf1j-3.algolianet.com/1/indexes/sitecore-events?x-algolia-application-id=Q0TMLOPF1J&x-algolia-api-key=30a0c84a152d179ea8aa1a7a59374d08&hitsPerPage=40&facets=` – Stack Jan 09 '18 at 21:43
  • It still returns the same error. – NFB Jan 09 '18 at 21:45
  • compare it with `requests` url , you can access that url by `r.url` – Stack Jan 09 '18 at 21:46
  • 1
    This brought an interesting discovery: the dict element "'numericFilters': ['startdate > 1511095966851']" was not correctly converted to URL by Scrapy, which was the problem. Requests was stripping it out altogether. I removed it manually from the Scrapy request, since it was an inessential filter, and it works. – NFB Jan 09 '18 at 22:17
  • ok great :) ... – Stack Jan 10 '18 at 05:07

1 Answers1

-1

I know you've "solved" the problem by leaving out a parameter, but the correct way to do this would be using a FormRequest

yield scrapy.FormRequest(
    url=url,
    method='GET',
    formdata=payload,
    callback=self.parse_item
)
stranac
  • 26,638
  • 5
  • 25
  • 30
  • Do you mean this would correctly parse the omitted parameter? It does not; it still returns the 403 error. It does work without it, however. – NFB Jan 09 '18 at 22:52
  • Works perfectly for me. I get the same response as with requests. – stranac Jan 09 '18 at 22:55