4

I have an extension which attaches to spider_opened and spider_closed. The spider_opened method is correctly called, but the spider_closed method is not. I close the spider by calling the scrapyd cancel endpoint.

class SpiderCtlExtension(object):

   @classmethod 
   def from_crawler(cls, crawler):
       ext = SpiderCtlExtension()

       ext.project_name = crawler.settings.get('BOT_NAME')
       crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)
       crawler.signals.connect(ext.spider_closed, signal=signals.spider_closed)

       return ext

   def spider_opened(self, spider):
       sql = """UPDATE ctl_crawler
             SET status = 'RUNNING'
             WHERE jobid = '{}'  """.format(os.getenv("SCRAPY_JOB"))
       engine.execute(sql)

   def spider_closed(self,spider,reason):
       sql = """UPDATE ctl_crawler
             SET status = '{}'
             WHERE jobid = '{}'  """.format(reason.upper(),os.getenv("SCRAPY_JOB"))
       engine.execute(sql)

Am I doing something wrong here?

kutschkem
  • 7,826
  • 3
  • 21
  • 56
  • Are you sure it is not called at all? Have you tried to put a print statement inside the signal handler for debugging purposes? – alecxe Feb 27 '15 at 19:21
  • Also, are there any other spider_closed signals defined in the project? – alecxe Feb 27 '15 at 19:21
  • @alecxe Not that I know of. And as to whether it is called, at least the database entry is not updated, where it is updated in the spider_opened method. – kutschkem Feb 27 '15 at 20:40

1 Answers1

0

This is a (windows-specific) bug, see my bug report https://github.com/scrapy/scrapyd/issues/83

The reason is that the way that the cancel method works, no shutdown handlers in the spider process are called.

kutschkem
  • 7,826
  • 3
  • 21
  • 56