I'm trying to send ~400 HTTP GET requests and collect the results. I'm running from django. My solution was to use celery with gevent.
To start the celery tasks I call get_reports :
def get_reports(self, clients, *args, **kw):
sub_tasks = []
for client in clients:
s = self.get_report_task.s(self, client, *args, **kw).set(queue='io_bound')
sub_tasks.append(s)
res = celery.group(*sub_tasks)()
reports = res.get(timeout=30, interval=0.001)
return reports
@celery.task
def get_report_task(self, client, *args, **kw):
report = send_http_request(...)
return report
I use 4 workers:
manage celery worker -P gevent --concurrency=100 -n a0 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a1 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a2 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a3 -Q io_bound
And I use RabbitMq as the broker.
And although it works much faster than running the requests sequentially (400 requests took ~23 seconds), I noticed that most of that time was overhead from celery itself, i.e. if I changed get_report_task like this:
@celery.task
def get_report_task(self, client, *args, **kw):
return []
this whole operation took ~19 seconds. That means that I spend 19 seconds only on sending all the tasks to celery and getting the results back
The queuing rate of messages to rabbit mq is seems to be bound to 28 messages / sec and I think that this is my bottleneck.
I'm running on a win 8 machine if that matters.
some of the things I've tried:
- using redis as broker
- using redis as results backend
tweaking with those settings
BROKER_POOL_LIMIT = 500
CELERYD_PREFETCH_MULTIPLIER = 0
CELERYD_MAX_TASKS_PER_CHILD = 100
CELERY_ACKS_LATE = False
CELERY_DISABLE_RATE_LIMITS = True
I'm looking for any suggestions that will help speed things up.