2

I am trying to use requests to fetch a page then pass the response object to a parser, but I ran into a problem:

def start_requests(self):
    yield self.parse(requests.get(url))
def parse(self, response):
  #pass

builtins.AttributeError: 'generator' object has no attribute 'dont_filter'

KEYAN TECH
  • 105
  • 1
  • 8
max will
  • 31
  • 1
  • 2

3 Answers3

3

You first need to download the page's resopnse and then convert that string to HtmlResponse object

from scrapy.http import HtmlResponse
resp = requests.get(url)

response = HtmlResponse(url="", body=resp.text, encoding='utf-8')
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
  • hey, I have the same problem as op. i tried ur code ||||||||||| response = HtmlResponse(url="", body=requests.get(line).text, encoding='utf-8') yield response |||||||| got this error """'HtmlResponse' object has no attribute 'dont_filter'""" – KEYAN TECH May 21 '19 at 16:42
0

what you need to do is

  1. get the page with python requests and save it to variable different then Scrapy response.

r = requests.get(url)

  1. replace scrapy response body with your python requests text.

response = response.replace(body = r.text)

thats it. Now you have Scrapy response object with all data available from python requests.

Billy Jhon
  • 1,035
  • 15
  • 30
0

yields return a generator so it iterates over it before the request get's the data you can remove the yield and it should work. I have tested it with sample URL

def start_requests(self):
    self.parse(requests.get(url))
def parse(self, response):
    #pass
Ikram Khan Niazi
  • 789
  • 6
  • 17