issue when using django(daphne) and scrapy together

Question

Im having issues getting my scrapy spider to work with django when using daphne. i noticed that when i use the normal wsgi server ie. without adding daphne to the list of installed apps, everything works fine as expected. but with daphne, my spider gets stuck after initial requests and never gets to parse.

i have made a minimalistic reproduction of the issue , and would be very greatful with anyone can explain why this issue occurs and how to go about it resolving it. thanks

PS: this is just a reproduction of the issue, as for why i need daphne, my project uses django channels

project structue

daphne_scrapy/
├── daphne_scrapy
│   ├── __init__.py
│   ├── asgi.py
│   ├── settings.py
│   ├── spiders.py
│   ├── urls.py
│   ├── views.py
│   └── wsgi.py
├── db.sqlite3
└── manage.py

settings.py

(these were the only changes made to the settings.py file, added daphne and asgi_app)

INSTALLED_APPS = [
    'daphne',
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]

WSGI_APPLICATION = 'daphne_scrapy.wsgi.application'
ASGI_APPLICATION = 'daphne_scrapy.asgi.application'

spider.py

note: i need to do the django setup because my spider needs to made use of django models in the real project im working on.

import scrapy
from scrapy.crawler import CrawlerProcess
import django, os

os.environ.setdefault('DJANGO_SETTINGS_MODULE','settings')
django.setup()

class MySpider(scrapy.Spider):
    name = 'my_spider'
    start_urls = ['https://jsonplaceholder.typicode.com/todos/1']

    def parse(self, response):
        data = response.json()
        print('in parse method')
        yield data

def run_spider():
    process = CrawlerProcess()
    process.crawl(MySpider)
    process.start()

run_spider()

result when i run my spider

when i run my spider it gets stuck here (Using selector: KqueueSelector)

...

2023-04-24 03:23:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-04-24 03:23:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-04-24 03:23:33 [scrapy.core.engine] INFO: Spider opened
2023-04-24 03:23:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-04-24 03:23:34 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-04-24 03:23:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://jsonplaceholder.typicode.com/todos/1> (referer: None)
2023-04-24 03:23:34 [asyncio] DEBUG: Using selector: KqueueSelector

after multiple troubleshootings, i found out that the issue is with the daphne server itself, i tried switching to another async server - uvicorn, and it worked, so i guess will have to continue development, then for deployment make sure to use uvicorn. — kizii, Apr 30 '23 at 20:11

issue when using django(daphne) and scrapy together

project structue

settings.py

spider.py

result when i run my spider

0 Answers0