The inspect.reserved()/scheduled()
you mention may work, but not
always accurate since it only take into account the tasks
that the workers have prefetched.
Celery does not allow out of band operations on the queue, like removing messages
from the queue, or reordering them, because it will not scale in a distributed system.
The messages may not have reached the queue yet, which can result
in race conditions and in practice it is not a sequential queue with transactional
operations, but a stream of messages originating from several locations.
That is, the Celery API is based around strict message passing semantics.
It is possible to access the queue directly on some of the brokers
Celery supports (like Redis or Database), but that is not part of the public API,
and you are discouraged from doing so, but of course if you are not planning on
supporting operations at scale you should do whatever is the most convenient for you
and discard my advice.
If this is just to give the user some idea when his job will be completed, then
I'm sure you could come up with an algorithm to predict when the task will be executed,
if you just had the length of the queue and the time at which each task was inserted.
The first is just a redis.len("celery")
, and the latter you could
add yourself by listening to the task_sent
signal:
from celery.signals import task_sent
@task_sent.connect
def record_insertion_time(id, **kwargs):
redis.zadd("celery.insertion_times", id)
Using a sorted set here: http://redis.io/commands/zadd
For a pure message passing solution you could use a dedicated monitor
that consumes the Celery event stream and predicts when tasks will finish.
http://docs.celeryproject.org/en/latest/userguide/monitoring.html#event-reference
(just noticed that the task-sent is missing the timestamp field in
the documentation, but a timestamp is sent with that event so I will fix it).
The events also contain a "clock" field which is a logical clock
(see http://en.wikipedia.org/wiki/Lamport_timestamps),
this can be used to detect the order of events in a distributed
system without depending on the system time on each machine
to be in sync (which is ~impossible to achieve).