13

I am having some trouble with a scrapy pipeline. My information is being scraped form sites ok and the process_item method is being called correctly. However the spider_opened and spider_closed methods are not being called.

class MyPipeline(object):

    def __init__(self):
        log.msg("Initializing Pipeline")
        self.conn = None
        self.cur = None

    def spider_opened(self, spider):
        log.msg("Pipeline.spider_opened called", level=log.DEBUG)

    def spider_closed(self, spider):
        log.msg("Pipeline.spider_closed called", level=log.DEBUG)

    def process_item(self, item, spider):
        log.msg("Processsing item " + item['title'], level=log.DEBUG)

Both the __init__ and process_item logging messages are displyed in the log, but the spider_open and spider_close logging messages are not.

I need to use the spider_opened and spider_closed methods as I want to use them to open and close a connection to a database, but nothing is showing up in the log for them.

If anyone has any suggested that would be very useful.

Jim Jeffries
  • 9,841
  • 15
  • 62
  • 103

2 Answers2

11

Sorry, found it just after I posted this. You have to add:

dispatcher.connect(self.spider_opened, signals.spider_opened)
dispatcher.connect(self.spider_closed, signals.spider_closed)

in __init__ otherwise it never receives the signal to call it

Jim Jeffries
  • 9,841
  • 15
  • 62
  • 103
  • 1
    Thanks for your answer, but where do you get the `dispatcher` variable? And how come I can't find this in http://doc.scrapy.org/en/latest/topics/item-pipeline.html? :( – wrongusername Oct 08 '12 at 18:05
  • 6
    For this to work, you need to make sure that you import the following things: `from scrapy.xlib.pydispatch import dispatcher` `from scrapy import signals` – herrherr Oct 28 '13 at 15:08
6

Proper method names are open_spider and close_spider, not spider_opened and spider_closed. It is documented here: http://doc.scrapy.org/en/latest/topics/item-pipeline.html#writing-your-own-item-pipeline.

Mikhail Korobov
  • 21,908
  • 8
  • 73
  • 65
  • 3
    This is incorrect. `spider_open` and `spider_closed` are signals not methods. As documented here http://doc.scrapy.org/en/latest/topics/signals.html?highlight=spider_opened#std:signal-spider_opened and here http://doc.scrapy.org/en/latest/topics/signals.html?highlight=spider_closed#std:signal-spider_closed – Jim Jeffries May 14 '14 at 14:21
  • 1
    There are methods named open_spider and close_spider, however they are not related to the question. – Jim Jeffries May 14 '14 at 14:30
  • 1
    Right, but why do you create spider_opened and spider_closed methods in your example pipeline and expect them to be called? And why do you need to attach signals manually if there are methods that are already being called? – Mikhail Korobov May 14 '14 at 16:07
  • I want to do something when these things happen not call them – Jim Jeffries May 14 '14 at 16:19
  • 4
    Why can't you do these things in `open_spider` and `close_spider` methods which are called when the spider is opened or closed? – Mikhail Korobov May 14 '14 at 20:30
  • Because [`spider_closed`](https://doc.scrapy.org/en/latest/topics/signals.html#scrapy.signals.spider_closed) signal provides `reason` parameter, for example. – user Oct 07 '18 at 20:58
  • @user the question was about why methods in the example don't work. They don't work because they're named wrongly. That's true you can also use signals, though it is a separate feature. – Mikhail Korobov Oct 08 '18 at 10:42