-1

Background: I am using celery for building a scheduling system to Crawl the websites on daily basis.We are crawling about 1 million urls (approx) daily. So it's becoming difficult to handle and manage the things at micro level. Celery is one where we thought could handle the current system in much better way than what it is now.

Problem: I have 1000 urls for a domain. What I am thinking to do is 1000 urls are equally divided into n equal chunks and then for each chunk, create a task and schedule it using celery.To do this, am not able to create (register) the tasks dynamically. And also I need to ensure the politeness policy over here. How to create the tasks on the fly in celery. There is no documentation for the same.

Am I going in right direction in solving this?

James Z
  • 12,209
  • 10
  • 24
  • 44
Sandeep
  • 53
  • 1
  • 7

1 Answers1

0

What do you mean by creating tasks on the fly?

You do write a task that crawls the website and call it like that:

crawl_website.delay(url='http://example.com')
Krzysztof Szularz
  • 5,151
  • 24
  • 35
  • I mean creating or registering the tasks dynamically and schedule it. Right now, am doing it by scheduling all the tasks in celeryconfig file. I should be able to add new tasks dynamically when celery workers will be already started. – Sandeep Aug 07 '14 at 10:01