I would like to implement a job queue in Mongo. The entire software system is based around Mongo so it seems natural and potentially a good fit.
The jobs collection stores each job state as a document. I imagine this to be an uncapped collection based of my query needs. The jobs documents look like the following:
{
"_id" : ObjectId("50a6742ee4b0a9a1c2cb4fd4"),
"type" : "archive_job",
"state" : 2,
"priority" : 1,
"timing" : {
"submitted": ISODate(...),
"running": ISODate(...),
"completed": ISODate(...),
"failed": null,
"cancelled": null
},
payload: {
...job-specific JSON...
}
}
The typical access patterns for the jobs collection will be:
- find unprocessed jobs to execute based on type, state, priority and possibly a range query on timing.submitted greater than the previous read time
- find all processed (completed, failed, cancelled) jobs
- find all unprocessed (submitted, running) jobs
- find specific job by _id and retrieve its payload (when state is running)
The bulk of the queries will be to find unprocessed jobs that need execution. Would it be worth while to move payload to a jobs_payload collection so the document size does not vary greatly in the jobs collection?
Will the large amount of processed (completed, failed, cancelled), versus unprocessed jobs, eventually increase the working set memory required for the jobs collections? Will the access times for unprocessed jobs to execute be slower even with the right indices?
What are my alternatives and trade-offs I can make with the schema design?