7

What's the best way to handle tasks executed in Celery where the result is large? I'm thinking of things like table dumps and the like, where I might be returning data in the hundreds of megabytes.

I'm thinking that the naive approach of cramming the message into the result database is not going to serve me here, much less if I use AMQP for my result backend. However, I have some of these where latency is an issue; depending on the particular instance of the export, sometimes I have to block until it returns and directly emit the export data from the task client (an HTTP request came in for the export content, it doesn't exist, but must be provided in the response to that request ... no matter how long that takes)

So, what's the best way to write tasks for this?

Chris R
  • 17,546
  • 23
  • 105
  • 172

2 Answers2

5

One option would be to have a static HTTP server running on all of your worker machines. Your task can then dump the large result to a unique file in the static root and return a URL reference to the file. The receiver can then fetch the result at its leisure.

eg. Something vaguely like this:

@task
def dump_db(db):
  # Some code to dump the DB to /srv/http/static/db.sql
  return 'http://%s/%s.sql' % (socket.gethostname(), db)

You would of course need some means of reaping old files, as well as guaranteeing uniqueness, and probably other issues, but you get the general idea.

Alec Thomas
  • 19,639
  • 4
  • 30
  • 24
1

I handle this by structuring my app to write the multi-megabyte results into files, which I them memmap into memory so they are shared among all processes that use that data... This totally finesses the question of how to get the results to another machine, but if the results are that large, it sounds like the these tasks are internal tasks coordinate between server processes.

Henry Crutcher
  • 2,137
  • 20
  • 28