3

I am try to learn how scrapy FormRequest works on website,I have the following scrapy code:

import scrapy
import json
from scrapy.utils.response import open_in_browser

class Test(scrapy.Spider):
    name = 'go2'
    def start_requests(self):
        url = 'http://www.webscrapingfordatascience.com/jsonajax/results2.php'
        payload = {'api_code': 'C123456'}
        yield scrapy.FormRequest(url,formdata=json.dumps(payload),headers={'Content-Type': 'application/json'})
        #yield scrapy.FormRequest(url,formdata=payload) #dict opbject not allowed ?
    def parse(self,response):
        #print(response.text)
        open_in_browser(response)

I can't seem to get to right response,I first tried using dictionary but it didn't work, then I tested with requests as following and both my tries works.

import requests
url = 'http://www.webscrapingfordatascience.com/jsonajax/results2.php'
payload={'api_code': 'C123456'}
res = requests.post(url, json=payload)
res2 = requests.post(url, data=json.dumps(payload))
#res3 = requests.post(url, data=payload) doesn't work

FormRequest takes in key,value not string which is why json.dumps() is throwing an error. My question is How can I get FormRequest (or any scrapy methods) to work on this example i.e get the same results as requests.

I believe res3 = requests.post(url, data=payload) is the same as FormRequest(url,formdata=payload) which is why it is not working.

hadesfv
  • 386
  • 4
  • 18

2 Answers2

1

According to scrapy docs - dict objects are allowed.

And Your code works code OK

Update (not actual as problem was in request body not in headers)

I use fiddler debugging proxy and compare requests and responses made by different libraries.
requests scrapy
As you can see Scrapy and requests library make request with different headers. If you need to receive tha valid request in scrapy code - you need to modify your headers.

UPDATE_2

class Test(scrapy.Spider):
    name = 'go2'
    def start_requests(self):

        url = 'http://www.webscrapingfordatascience.com/jsonajax/results2.php'
        payload = {'api_code': 'C123456'}
        yield scrapy.FormRequest(url,body=json.dumps(payload))

    def parse(self,response):
        print(response.text)
        #open_in_browser(response)
Georgiy
  • 3,158
  • 1
  • 6
  • 18
0

It's a common trap for scrapy users. FormRequest is forming urlencoded payload from dict, eg:

a = {'key1': 'value1', 'key2': 'value2'}
urlencode(a)
#Result: 'key2=value2&key1=value1'

In your case you should use regular Request class with body=json.dumps(your_dict)

Michael Savchenko
  • 1,445
  • 1
  • 9
  • 13