I have made a scraper to scrape some links from web page and want to run this scraper every 1 hours which resides in django app, but django it is impossible to run a scraper every 1 hours because the django views depends on the request response object. to solve this problem I have decided to use a python library named celery and according to the documentation I have write celery.py and tasks.py files
By django project structure is like this
newsportal
- newsportal
-settings.py
-celery.py
__init__.py
- news
-tasks.py
-views.py
-models.py
celery.py
has the following code
from __future__ import absolute_import
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'newsportal.settings')
from django.conf import settings # noqa
app = Celery('newsportal')
# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
@app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
__init__.py
file has the following lines of code
from __future__ import absolute_import
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app # noqa
while as tasks.py
has the following lines of code
from __future__ import absolute_import
from celery import shared_task
from crawler import crawler
from .models import News
@shared_task
def news():
'''
scrape all links
'''
news = [] #store dict object
allnews.append(crawler())
for news_dict in allnews:
for news, url in news_dict.items():
#Save all the scrape news in database
News.objects.create(title=news, url=url, source=source)
what I want to do is to run the above news() function every 1 hours and save the result to the database.
I want to save the result of the tasks to the django database, how can I achive this.
according to the celery docs, to save the result given by the worker we need install django-celery==3.1.17
, as I have already installed, and do migration.
For the database backend in celery according to celery docs, we should put
app.conf.update(
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
)
line of code on settings.py file, on putting this of code in `settings.py` file I got the error of
settings.py", line 141, in <module>
app.conf.update(
NameError: name 'app' is not defined
as I have already Import and put the following line of code in settings.py
file as below
from __future__ import absolute_import
BROKER_URL = 'redis://localhost'
The main thing I want to do is,
- Running the above crawler every 1 hour and saving the result of crawler in databse called news How can I accomplish this using celery or am I missing something ?
Are there any other alternatives way to accomplish this task