I have got a program that handle about 500 000 files {Ai} and for each file, it will fetch a definition {Di} for the parsing.
For now, each file {Ai} is parsed by a dedicated celery task and each time the definition file {Di} is parsed again to generate an object. This object is used for the parsing of the file {Ai} (JSON representation).
I would like to store the definition file (generated object) {Di(object)} to make it available for whole task.
So I wonder what would be the best choice to manage it:
- Memcahe + Python-memcached,
- A Long running task to "store" the object with set(add)/get interface.
For performance and memory usage, what would be the best choice ?