Scrapy - AttributeError: 'dict' object has no attribute 'dont_filter'

Question

I'm trying to run this code, the webdriver opens the page but soon after it stops working and I receive and error: AttributeError: 'dict' object has no attribute 'dont_filter'. This is my code:

import scrapy
from scrapy import Spider
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from scrapy.selector import Selector
from scrapy.http import Request



class RentalMarketSpider(Spider):
    name = 'rental_market'
    allowed_domains = ['home.co.uk']
    

    def start_requests(self):
        s=Service('/Users/chrisb/Desktop/Scrape/Home/chromedriver')
        self.driver = webdriver.Chrome(service=s)
        self.driver.get('https://www.home.co.uk/for_rent/ampthill/current_rents?location=ampthill')
        sel = Selector(text=self.driver.page_source)

        tot_prop_rent = sel.xpath('.//div[1]/table/tbody/tr[1]/td[2]/text()').extract_first()
        last_14_days = sel.xpath('.//div[1]/table/tbody/tr[2]/td[2]/text()').extract_first()
        average = sel.xpath('.//div[1]/table/tbody/tr[3]/td[2]/text()').extract_first()
        median = sel.xpath('.//div[1]/table/tbody/tr[4]/td[2]/text()').extract_first()

        one_b_num_prop = sel.xpath('.//div[3]/table/tbody/tr[2]/td[2]/text()').extract_first()
        one_b_average = sel.xpath('.//div[3]/table/tbody/tr[2]/td[3]/text()').extract_first()
        
        yield {
                'tot_prop_rent': tot_prop_rent,
                'last_14_days': last_14_days,
                'average': average,
                'median': median,
                'one_b_num_prop': one_b_num_prop,
                'one_b_average': one_b_average
            }

Below is the full error I receive. I looked everywhere but couldn't find a clear answer in order to get rid of this error:

2021-12-23 17:43:26 [twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/commands/crawl.py", line 27, in run
    self.crawler_process.start()
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/crawler.py", line 327, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 1318, in run
    self.mainLoop()
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 1328, in mainLoop
    reactorBaseSelf.runUntilCurrent()
--- <exception caught here> ---
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 994, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/utils/reactor.py", line 50, in __call__
    return self._func(*self._a, **self._kw)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/engine.py", line 137, in _next_request
    self.crawl(request, spider)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/engine.py", line 218, in crawl
    self.schedule(request, spider)
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/engine.py", line 223, in schedule
    if not self.slot.scheduler.enqueue_request(request):
  File "/Users/chrisb/opt/anaconda3/lib/python3.8/site-packages/scrapy/core/scheduler.py", line 78, in enqueue_request
    if not request.dont_filter and self.df.request_seen(request):
builtins.AttributeError: 'dict' object has no attribute 'dont_filter'

2021-12-23 17:43:26 [scrapy.core.engine] INFO: Closing spider (finished)

Any advice would be appreciated. Thanks for your time.

score 1 · Answer 1 · edited Mar 20 '22 at 16:51

start_requests is supposed to yield individual Request objects Not a dict

I am facing the same issue now. I think if we do this process in another function it will work. This solution is not the best but better than nothing:

def start_requests(self):
    yield scrapy.Request(url='https://scrapy.org/', callback=self.parse)
def parse(self,response):
    s = Service('/Users/chrisb/Desktop/Scrape/Home/chromedriver')
    self.driver = webdriver.Chrome(service=s)
    self.driver.get('https://www.home.co.uk/for_rent/ampthill/current_rents?location=ampthill')
    sel = Selector(text=self.driver.page_source)

    tot_prop_rent = sel.xpath('.//div[1]/table/tbody/tr[1]/td[2]/text()').extract_first()
    last_14_days = sel.xpath('.//div[1]/table/tbody/tr[2]/td[2]/text()').extract_first()
    average = sel.xpath('.//div[1]/table/tbody/tr[3]/td[2]/text()').extract_first()
    median = sel.xpath('.//div[1]/table/tbody/tr[4]/td[2]/text()').extract_first()

    one_b_num_prop = sel.xpath('.//div[3]/table/tbody/tr[2]/td[2]/text()').extract_first()
    one_b_average = sel.xpath('.//div[3]/table/tbody/tr[2]/td[3]/text()').extract_first()

    yield {
        'tot_prop_rent': tot_prop_rent,
        'last_14_days': last_14_days,
        'average': average,
        'median': median,
        'one_b_num_prop': one_b_num_prop,
        'one_b_average': one_b_average
    }

I have just tested it and it worked.

Note that we prefer a technical style of writing here. We gently discourage greetings, hope-you-can-helps, thanks, advance thanks, notes of appreciation, regards, kind regards, signatures, please-can-you-helps, chatty material and abbreviated txtspk, pleading, how long you've been stuck, voting advice, meta commentary, etc. Just explain your problem, and show what you've tried, what you expected, and what actually happened. — halfer, Mar 20 '22 at 16:51

score 0 · Answer 2 · answered Dec 23 '21 at 21:55

0

I don't see anything wrong within your code as such. Possibly you are using a old version of ChromeDriver returning a the wrong shaped object.

Solution

Ensure that:

ChromeDriver is updated to current ChromeDriver v96.0 level.
Chrome is updated to current chrome=96.0.4664.45 (as per chrome=96.0.4664.45 release notes).

tl; dr

FIND_ELEMENT command return a dict object value

answered Dec 23 '21 at 21:55

undetected Selenium

183,867
41
278
352

I've checked everything again, both browser and driver are updated, but got the same error. Any thoughts? Thanks – christian_bear Dec 24 '21 at 11:15
I also run the code on windows and got same error. – christian_bear Dec 24 '21 at 11:17

Scrapy - AttributeError: 'dict' object has no attribute 'dont_filter'

2 Answers2

Solution

tl; dr