How to use python requests with scrapy?

Question

I am trying to use requests to fetch a page then pass the response object to a parser, but I ran into a problem:

def start_requests(self):
    yield self.parse(requests.get(url))
def parse(self, response):
  #pass

builtins.AttributeError: 'generator' object has no attribute 'dont_filter'

i think " if not request.dont_filter and self.df.request_seen(request):" is part of error — KEYAN TECH, May 21 '19 at 16:39
Why are you trying to use the `requests` library instead of `scrapy.Request`s? — Thiago Curvelo, May 22 '19 at 00:31

score 3 · Answer 1 · answered May 21 '19 at 07:23

3

You first need to download the page's resopnse and then convert that string to HtmlResponse object

from scrapy.http import HtmlResponse
resp = requests.get(url)

response = HtmlResponse(url="", body=resp.text, encoding='utf-8')

answered May 21 '19 at 07:23

Umair Ayub

19,358
14
72
146

hey, I have the same problem as op. i tried ur code ||||||||||| response = HtmlResponse(url="", body=requests.get(line).text, encoding='utf-8') yield response |||||||| got this error """'HtmlResponse' object has no attribute 'dont_filter'""" – KEYAN TECH May 21 '19 at 16:42

score 0 · Answer 2 · answered Jan 17 '23 at 13:34

what you need to do is

get the page with python requests and save it to variable different then Scrapy response.

r = requests.get(url)

replace scrapy response body with your python requests text.

response = response.replace(body = r.text)

thats it. Now you have Scrapy response object with all data available from python requests.

score 0 · Answer 3 · answered Jan 17 '23 at 13:43

yields return a generator so it iterates over it before the request get's the data you can remove the yield and it should work. I have tested it with sample URL

def start_requests(self):
    self.parse(requests.get(url))
def parse(self, response):
    #pass

How to use python requests with scrapy?

3 Answers3

Linked