1

I'm writing a web application with Django where users can upload files with statistical data.

The data needs to be processed before it can be properly used (each dataset can take up to a few minutes of time before processing is finished). My idea was to use a python thread for this and offload the data processing into a separate thread.

However, since I'm using uwsgi, I've read about a feature called "Spoolers". The documentation on that is rather short, but I think it might be what I'm looking for. Unfortunately the -Q option for uwsgi requires a directory, which confuses me.

Anyway, what are the best practices to implement something like worker threads which don't block uwsgi's web workers so I can reliably process data in the background while still having access to Django's database/models? Should I use threads instead?

BastiBen
  • 19,679
  • 11
  • 56
  • 86

1 Answers1

1

All of the offloading subsystems need some kind of 'queue' to store the 'things to do'.

uWSGI Spooler uses a printer-like approach where each file in the directory is a task. When the task in done the file is removed. Other systems relies on more heavy/advanced servers like rabbitmq and so on.

Finally, do not directly use the low-level api of the spooler but rely on decorators:

http://projects.unbit.it/uwsgi/wiki/Decorators

roberto
  • 12,723
  • 44
  • 30
  • Right now I'd have a database record for all (finished and unfinished) datasets, each of them using a 'status' field so I can provide feedback to the user at any time. What would I write to those files? The UUID or primary key for each dataset to be processed? Where does the spooler code actually run (in which process)? Can I access anything from there, like models/db, or are there any constraints? – BastiBen Aug 31 '12 at 05:17
  • the spooler could be seen as an 'unconnected' worker. It is a worker, but without the ability to receive requests on a socket. So, a spooler process can have access to all of the django internals. Another solution (more versatile) is using mules, they are a more prymitive implementation allowing you to directly interact with your database (without saving tasks in the spooler directory) http://projects.unbit.it/uwsgi/wiki/Mules – roberto Aug 31 '12 at 08:00
  • Looks like mules are the thing I'm looking for. Thank you! – BastiBen Aug 31 '12 at 09:02