3

I'm following this answer to get the spider list on my Scrapy Project inside Django, so this is what the structure looks like.

my_app/
-- apps/  # django apps folder
   -- crawler/ 
      -- __init__.py
      -- admin.py
      -- apps.py
      -- views.py <~ here is where the code below located
      -- etc..
-- my_app/  # django project folder
   -- __init__.py
   -- asgi.py
   -- settings.py
   -- etc..
-- scraper_app/ # scrapy dir
   -- scraper_app/ # scrapy project folder
      -- spiders/
         -- abc_spider.py
      -- __init__.py
      -- middlewares.py
      -- pipelines.py
      -- settings.py
      -- etc..
   -- scrapy.cfg
-- manage.py
-- scrapyd.conf
-- setup.py
-- etc..

and here is the piece of codes that showing the list of available spiders, when I run it on scrapy shell, but its always return an empty string when I tried to run it from django app in views.py, which is crawler app.

project_settings = project.get_project_settings()
spider_loader = spiderloader.SpiderLoader.from_settings(project_settings)
spiders = spider_loader.list()

so my problem is, how to make those script working on django project as well using Django or Scrapy way if available? thanks

EDITED I just realized that when I tried to get the values from project.get_project_settings() in scrapy shell it'll return with

'SPIDER_MODULES': ['scraper_app.spiders']

but when I tried to run it from django, the SPIDER_MODULES return an empty list

2 Answers2

0

You should to integrate your scraper with Django. For example in Django settings.py:

import os
import sys
    
# DJANGO INTEGRATION
    
sys.path.append(os.path.dirname(os.path.abspath('.')))
os.environ['DJANGO_SETTINGS_MODULE'] = '<scrapper>.settings'
Dmitry Rusanov
  • 533
  • 5
  • 17
0

I know its been too long after I asked this question, but I finally found the best answer for me and the cleanest way, without using os module.

from scrapy.utils import project
from scrapy import spiderloader

project_settings = project.get_project_settings()
project_settings.set('SPIDER_MODULES', ['path.to.scrapy.spiders'])
project_settings.set('NEWSPIDER_MODULE', 'path.to.scrapy.spiders')
spider_loader = spiderloader.SpiderLoader.from_settings(project_settings)
spiders = spider_loader.list()

so in my solution, I just need to reassign/replace the settings object attributes.