Im having issues getting my scrapy spider to work with django when using daphne. i noticed that when i use the normal wsgi server ie. without adding daphne to the list of installed apps, everything works fine as expected. but with daphne, my spider gets stuck after initial requests and never gets to parse.
i have made a minimalistic reproduction of the issue , and would be very greatful with anyone can explain why this issue occurs and how to go about it resolving it. thanks
PS: this is just a reproduction of the issue, as for why i need daphne, my project uses django channels
project structue
daphne_scrapy/
├── daphne_scrapy
│ ├── __init__.py
│ ├── asgi.py
│ ├── settings.py
│ ├── spiders.py
│ ├── urls.py
│ ├── views.py
│ └── wsgi.py
├── db.sqlite3
└── manage.py
settings.py
(these were the only changes made to the settings.py file, added daphne and asgi_app)
INSTALLED_APPS = [
'daphne',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
]
WSGI_APPLICATION = 'daphne_scrapy.wsgi.application'
ASGI_APPLICATION = 'daphne_scrapy.asgi.application'
spider.py
note: i need to do the django setup because my spider needs to made use of django models in the real project im working on.
import scrapy
from scrapy.crawler import CrawlerProcess
import django, os
os.environ.setdefault('DJANGO_SETTINGS_MODULE','settings')
django.setup()
class MySpider(scrapy.Spider):
name = 'my_spider'
start_urls = ['https://jsonplaceholder.typicode.com/todos/1']
def parse(self, response):
data = response.json()
print('in parse method')
yield data
def run_spider():
process = CrawlerProcess()
process.crawl(MySpider)
process.start()
run_spider()
result when i run my spider
when i run my spider it gets stuck here (Using selector: KqueueSelector)
...
2023-04-24 03:23:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-04-24 03:23:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-04-24 03:23:33 [scrapy.core.engine] INFO: Spider opened
2023-04-24 03:23:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-04-24 03:23:34 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-04-24 03:23:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://jsonplaceholder.typicode.com/todos/1> (referer: None)
2023-04-24 03:23:34 [asyncio] DEBUG: Using selector: KqueueSelector