Scrapy Process stopping execution after second spider completes crawling, without waiting for the first one to complete

Question

I have created a script to run two spiders in the same process and generate desired output. If the first spider completes crawling before the second, I get the desired output. However, if the second spider finishes execution before the first one, the script is terminated without waiting for the first spider to complete crawling. What could be the reason? What modification should I do in my code?

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess

setting = get_project_settings()
process = CrawlerProcess(setting)

for spider_name in process.spider_loader.list():
    setting['FEED_FORMAT'] = 'json'
    setting['LOG_LEVEL'] = 'INFO'
    setting['FEED_URI'] = spider_name+'.json'
    setting['LOG_FILE'] = spider_name+'.log'
    process = CrawlerProcess(setting)
    print("Running spider %s" % spider_name)
    process.crawl(spider_name)

process.start()
print("Completed")

The “running” starts on `process.start()`. In your loop, you are overriding the `process` variable every time with a new `CrawlerProcess` instance, hence only the last spider will actually run. — Gallaecio, Jun 17 '20 at 16:03
How else do I pass different feed URI for each crawler? This was the easiest way. Actually both spiders are running, but the process waits until the last spider completes to terminate the process. I need a hack to save the outputs of both spiders in diffferent json files — Mithil Mohan, Jun 18 '20 at 04:28
however if I do setting.update({ 'FEED_FORMAT': 'json', 'FEED_URI': spider_name + ".json", 'LOG_FILE': spider_name + '.log', 'LOG_LEVEL': 'INFO' }) alone in the loop it works fine and generates the appropriate json output. But the log files aren't proper. Is there a way to make log files proper too — Mithil Mohan, Jun 18 '20 at 06:42
Have you considered using `Spider.custom_settings` to define spider-specific settings? — Gallaecio, Jun 18 '20 at 10:30
Follow-up question by the original author: https://stackoverflow.com/q/62442491/939364 — Gallaecio, Jun 18 '20 at 10:33

Scrapy Process stopping execution after second spider completes crawling, without waiting for the first one to complete

0 Answers0