ReactorNotRestartable error when running two spiders sequentially using CrawlerProcess

Question

I'm trying to run two spiders sequentially, here is the structure of my module

class tmallSpider(scrapy.Spider):
    name = 'tspider'
    ...

class jdSpider(scrapy.Spider):
    name = 'jspider'
    ...

process = CrawlerProcess(get_project_settings())
process.crawl('tspider')
process.crawl('jspider')
process.start(stop_after_crawl=False)

When I run this, I get this error:

raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable

When I scroll in the terminal, I see that two spiders are successfully ran and the data I want to get are successfully scraped. However, the error occurs at the end and I guess it's because the process can't terminate? I tried process.stop but it does not work. I also tried the code on the official guideline (https://docs.scrapy.org/en/latest/topics/practices.html) but that one causes a spider not found error. Any ides how to fix it?

score 1 · Answer 1 · answered Jul 09 '20 at 09:46

1

Have you tried CrawlRunner and the example the Docs give? CrawlerRunner is for useful for running multiple spiders and being able to manaully stop.

If you have, could you provide a minimal example of your code for that and the explicit error message you get.

answered Jul 09 '20 at 09:46

AaronS

2,245
2
6
16

Yes, I tried these `configure_logging() runner = CrawlerRunner() runner.crawl('tspider') runner.crawl('jspider') d = runner.join() d.addBoth(lambda _: reactor.stop()) reactor.run()` The error I get is long, but at the end it says 'raise KeyError("Spider not found: {}".format(spider_name)) KeyError: 'Spider not found: tspider'' – Tianhe Xie Jul 09 '20 at 09:53
Traceback (most recent call last): File "/Users/tiaxie/PycharmProjects/Tmall/venv/lib/python3.8/site-packages/scrapy/spiderloader.py", line 76, in load return self._spiders[spider_name] KeyError: 'tspider' – Tianhe Xie Jul 09 '20 at 09:53
Have you tried getting rid of the single quotes in runner.crawl('jspider') ? The crawl method takes the name of class not a string. – AaronS Jul 09 '20 at 10:06
I just fixed that, but it went back to the same issue as I posted earlier `raise error.ReactorNotRestartable() twisted.internet.error.ReactorNotRestartable`. Again, I can see the crawler getting the correct result, but there's this error at the end and the program can't terminate without clicking the stop button in pycharm – Tianhe Xie Jul 09 '20 at 13:21
As stated in the docs, twisted.internet reactor is not restartable. What works for me is isolating each run in a different process using multiprocessing. – Nestor Solalinde Aug 15 '20 at 05:19

ReactorNotRestartable error when running two spiders sequentially using CrawlerProcess

1 Answers1