2

I always get NotImplementedError('{}.parse callback is not defined'.format(self.__class__.__name__)). however, I tried to follow the example here.

2019-12-27 11:40:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://latindancecalendar.com/festivals/> (referer: None)
ERROR:scrapy.core.scraper:Spider error processing <GET https://latindancecalendar.com/festivals/> (referer: None)
Traceback (most recent call last):
  File "/Users/Marc/.local/share/virtualenvs/scrapy-Qon0LmmU/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/Marc/.local/share/virtualenvs/scrapy-Qon0LmmU/lib/python3.7/site-packages/scrapy/spiders/__init__.py", line 80, in parse
    raise NotImplementedError('{}.parse callback is not defined'.format(self.__class__.__name__))
NotImplementedError: LatindancecalendarSpider.parse callback is not defined
2019-12-27 11:40:40 [scrapy.core.scraper] ERROR: Spider error processing <GET https://latindancecalendar.com/festivals/> (referer: None)
Traceback (most recent call last):
  File "/Users/Marc/.local/share/virtualenvs/scrapy-Qon0LmmU/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/Marc/.local/share/virtualenvs/scrapy-Qon0LmmU/lib/python3.7/site-packages/scrapy/spiders/__init__.py", line 80, in parse
    raise NotImplementedError('{}.parse callback is not defined'.format(self.__class__.__name__))
NotImplementedError: LatindancecalendarSpider.parse callback is not defined

spider.py

class LatindancecalendarSpider(scrapy.Spider):
    name = "latindancecalendar"
    allowed_domains = ["latindancecalendar.com"]
    start_urls = ["https://latindancecalendar.com/festivals/"]

    rules = (
        Rule(
            LinkExtractor(
                restrict_xpaths=("//div[@class='eventline event_details']/a")
            ),
            callback="parse_event",
        ),
    )

    def xpath_get(self, response, key):
        # Get page title: //h1[@class="page-title"]/text()
        xpath_ = {
            "name": ".//div[@class='eventline event_details']/a/text()",
            "event_link": ".//div[@class='eventline event_details']/a/@href",
            "date": "//div[@class='vevent']/div/span/b/text()",
            "website": "//div[@class='vevent']/div[@class='top_menu_wrapper']/div[@class='top_menu']/a[text()='Website']/@href",
            "facebook_event": "//div[@class='vevent']/div[@class='top_menu_wrapper']/div[@class='top_menu']/a[text()='Facebook Event']/@href",
            "city_country": "//div[@class='vevent']/div/div[span = 'Location: ']/text()",
        }
        return response.xpath(xpath_.get(key)).get()

    def parse_event(self, response):
        event = LatinDanceCalendarItem()
        event["name"] = "ABC"
        event["date"] = self.xpath_get(response, "date")
        event["website"] = self.xpath_get(response, "website")
        event["facebook_event"] = self.xpath_get(response, "facebook_event")
        yield event  # Will go to pipeline
Joey Coder
  • 3,199
  • 8
  • 28
  • 60

5 Answers5

4

The Spider class requiresparse. If you use a custom parse (ex: parse_event), it must be in the CrawlSpider class.

class LatindancecalendarSpider(scrapy.Spider):
from scrapy.spiders import CrawlSpider, Rule
class LatindancecalendarSpider(CrawlSpider):

Refer: Parse callback is not defined - Simple Webscraper (Scrapy) still not running

Yuko Kanai
  • 41
  • 2
2

Example in link uses CrawlSpider which has defined parse() but you uses Spider which has only

def parse(self, response):
    raise NotImplementedError('{}.parse callback is not defined'.format(self.__class__.__name__))

and you have to define own method parser() in your class.


BTW: Using

import scrapy.spiders

print(scrapy.spiders.__file__)

you can get path to source code and see it.

furas
  • 134,197
  • 12
  • 106
  • 148
2

I had the same error a while back. Just rename the xpath_get method to the default parse since parse is mandatory when you use Spider class

ranasaurus
  • 21
  • 5
2

It's because you didn't define the parse function in your code,

When you use start_urls method, The default implementation generates Request(url, dont_filter=True,callback=self.parse) for each url in start_urls, means it'll look for parse function in your code.

so please define parse function or you can use scrapy's other method called start_requests in this method you can put the callback to your desired function.

def start_requests(self):
    URL = "https://latindancecalendar.com/festivals/"
    yield scrapy.Request(url,callback=self.parse_event)
0

You must be having more than one spider with the same name.
Try to change
name = "latindancecalendar"
to something else like
name = "latindancecalendar_2"