0

I'm trying to run spiders from Django management command.

It works but it doesn't use settings from scrapy project.

django_project/
    django_project/
    app1/
    scraping/ # This is app but it has scrapy project inside too
        scrapy_spider/
            settings.py
            spiders/

When I try to specify settings inside the command, it returns:

ModuleNotFoundError: No module named 'scrapy_spider'

COMMAND

import os
from django.core.management.base import BaseCommand
from scrapy.utils.project import get_project_settings

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner

from scraping.scrapy_spider.spiders.autoscrape_index_spider import AutoScrapeIndexSpider
from scraping.scrapy_spider.spiders.autoscrape_spider import AutoScrapeSpider


class Command(BaseCommand):

    def handle(self, *args, **options):
        os.environ['SCRAPY_SETTINGS_MODULE'] = 'scraping.scrapy_spider.settings'
        runner = CrawlerRunner(settings=get_project_settings())

        @defer.inlineCallbacks
        def crawl():
            yield runner.crawl(AutoScrapeIndexSpider)
            yield runner.crawl(AutoScrapeSpider)
            reactor.stop()

        crawl()
        reactor.run()

Do you know how to make it work?

Milano
  • 18,048
  • 37
  • 153
  • 353

0 Answers0