I am using Celery to distribute tasks to multiple servers. For some reason, adding 7,000 tasks to the queue is incredibly slow and appears to be CPU bound. It takes 12 seconds to execute the code below, which is just adding tasks to the queue.
start = time.time()
for url in urls:
fetch_url.apply_async((url.strip(),), queue='fetch_url_queue')
print time.time() - start
Switching between brokers (have tried redis, RabbitMQ, and pyamqp) does not have any significant effect.
Reducing the number of workers (which are each running on their own server, separate from the master server which adds the tasks) does not have any significant effect.
The URLs being passed are very small, each just about 80 characters.
The latency between any two given servers in my configuration is sub-millisecond (<1ms).
I must be doing something wrong. Surely Celery must be able to add 7,000 tasks to the queue in less time than several seconds.