7

Nearly synchronous works, too; basically, I want to delegate the data access and processing behind a web app to a task queue for most jobs. What's the fastest latency that I can consider reasonable for celery tasks?

Update (for clarification)

I guess for clarity I should explain that throughput -- while nice -- is not a necessary issue for me; I won't be needing scaling in that direction for a while, yet. Latency is the only criterion that I'll be evaluating at the moment. I'm content to use task.apply if that's the only way it'll work, but I'd like to farm the work out a bit.

Chris R
  • 17,546
  • 23
  • 105
  • 172
  • This really depends on hardware, network, broker and configuration. I've been able to get ~1000 tasks/s (roundtrip: publish and get results) on my MacBook Pro 2x core. You won't get that with the default configuration though, as it's optimized for a middle ground between lots of short and some very long tasks. – asksol Nov 19 '10 at 15:43
  • That sounds like a good place to start; I can scale the number of workers pretty much forever, so could you provide guidance on how to get that low-latency configuration set up? I've got both short (return data for immediate use) and long (generate a big report) tasks to execute, if it makes a difference. Alternately, where's a good blog or doc with details on how to do this? – Chris R Nov 19 '10 at 15:54
  • You can start with CELERYD_PREFETCH_MULTIPLIER=0 (or raising it from the default of 4 to 32/64). Then there's CELERY_DISABLE_RATE_LIMITS=True, which disables parts of the "machinery" if you don't need rate limits. – asksol Nov 20 '10 at 18:32
  • That seems reasonable. So, I gather that this means the workers will already have 'reservations' for the next task with this setting? How do I ensure that the result is received with reasonable celerity? I want to be able to have behaviour that's close to the behaviour of `task.apply` here in terms of speed... will I get that with this? – Chris R Nov 21 '10 at 18:07
  • With 0 it reserves as many tasks as possible (even millions), so not a good idea, but you set it to a large number. Remember that the broker will redeliver the tasks if the worker crashes. – asksol Nov 22 '10 at 10:13
  • You will *never* get the same speed as with `.apply`, as you need to send the task as a message, there's network latency and so on. But if using the amqp result backend roundtrip is very short. Just make sure the queue is processed fast enough, and that it doesn't contain a lot of tasks to process. – asksol Nov 22 '10 at 10:13

1 Answers1

7

When I say throughput I mean the average latency from sending a task until it's been executed. With roundtrip I mean the average time it takes to send a task, executing it, sending the result back and retrieving the result.

As I said in the comments I currently don't have any official numbers to share, but with the right configuration Celery is low latency compared to many other solutions, but still it does come with more overhead than executing a function locally. This is something to take into account when designing the granularity of a task[1]

I'm currently writing a performance guide that may be of interest: http://ask.github.com/celery/userguide/optimizing.html

Feedback welcome, and would like to know about any other performance factors you are interested in.

[1] http://celeryq.org/docs/userguide/tasks.html#granularity

asksol
  • 19,129
  • 5
  • 61
  • 68