3

I've been working with firebug and I've got the following dictionaries to query an api.

url = "htp://my_url.aspx#top"

querystring = {"dbkey":"x1","stype":"id","s":"27"}

headers = {
    'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    'upgrade-insecure-requests': "1",
    'user-agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125 
    }

with python requests, using this is as simple as:

import requests
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)

How can I use these in Scrapy? I've been reading http://doc.scrapy.org/en/latest/topics/request-response.html and I know that the following works for post:

        r = Request(my_url, method="post",  headers= headers, body=payload, callback=self.parse_method)

I've tried:

    r = Request("GET", url, headers=headers, body=querystring, callback=self.parse_third_request)

I'm getting:

r = Request("GET", url, headers=headers, body=querystring, callback=self.parse_third_request)
TypeError: __init__() got multiple values for keyword argument 'callback'

edit:

changed to :

    r = Request(method="GET", url=url, headers=headers, body=querystring, callback=self.parse_third_request)

now getting:

  File "C:\envs\r2\tutorial\tutorial\spiders\parker_spider.py", line 90, in parse_second_request
    r = Request(method="GET", url=url, headers=headers, body=querystring, callback=self.parse_third_request)
  File "C:\envs\virtalenvs\teat\lib\site-packages\scrapy\http\request\__init__.py", line 26, in __init__
    self._set_body(body)
  File "C:\envs\virtalenvs\teat\lib\site-packages\scrapy\http\request\__init__.py", line 68, in _set_body
    self._body = to_bytes(body, self.encoding)
  File "C:\envs\virtalenvs\teat\lib\site-packages\scrapy\utils\python.py", line 117, in to_bytes
    'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got dict

edit 2:

I now have:

    yield Request(method="GET", url=url, headers=headers, body=urllib.urlencode(querystring), callback=self.parse_third_request)

def parse_third_request(self, response):
    from scrapy.shell import inspect_response
    inspect_response(response, self)
    print("hi")
    return None

There are no errors but in the shell when I do "response.url" I only get the base url with no get parameters.

user1592380
  • 34,265
  • 92
  • 284
  • 515

1 Answers1

3

Look at the signature of the Request initialization method:

class scrapy.http.Request(url[, callback, method='GET', headers, body, cookies, meta, encoding='utf-8', priority=0, dont_filter=False, errback])

GET string in your case is used as a positional value for the callback argument.

Use a keyword argument for the method instead (though GET is the default):

r = Request(url, method="GET", headers=headers, body=querystring, callback=self.parse_third_request)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks , it seems that the problem is now that I'm loading a dict for the parameters. Is there an easy fix? – user1592380 Jun 04 '16 at 17:32
  • @user61629 ah, yeah, you can use the `json.dumps(querystring)`..don't forget to `import json`. – alecxe Jun 04 '16 at 17:41
  • 1
    @user61629 you know what, the last suggestion might not be correct. I think the `urllib.urlencode(querystring)` should be used instead. Let me know what worked for you. Thanks. – alecxe Jun 04 '16 at 17:45
  • 1
    @user61629 okay, I'm afraid you cannot use `body`, try with something like `url + '?' + urllib.urlencode(querystring)`. Let me know if it helps or not. – alecxe Jun 04 '16 at 18:19
  • The problem now is that its producing something of the form: htp://my_url.aspx#top?s=27&s=d&dbkey=jkl, top should be at end . maybe I should use requests to form the url. – user1592380 Jun 04 '16 at 18:35