0

Let's say I add 100 push tasks (as group 1) to my tasks-queue. Then I add another 200 tasks (as group 2) to the same queue. How can I understand if all tasks of group 1 are finished?

Looks like QueueStatistics will not help here. tag works only with pull queues.

And I can not have separate queues (since I may have hundreds of groups).

LA_
  • 19,823
  • 58
  • 172
  • 308
  • It certainly won't be easy. You'd probably want a sharded counter in the datastore that increments whenever a task in its group completes. you could then check the number of completed tasks based on the group from the sharded counter and see if it equals the number of tasks put. This still might not be perfect as I believe that tasks are allowed to run twice in some circumstances, so your sharded counter will also need to have a reliable key so that the same task executed twice will set the same counter entity. – mgilson Jan 13 '16 at 19:28

2 Answers2

0

I would probably solve it by using a sharded counter in datastore like @mgilson said and decorate my deferred functions to run a callback when the tasks are done running.

I think something like this is what you are looking for if you include the code at https://cloud.google.com/appengine/articles/sharding_counters?hl=en and write a decriment function to complement the increment one.

import random
import time
from google.appengine.ext import deferred

def done_work():
  logging.info('work done!')

def worker(callback=None):
  def fst(f):
    def snd(*args, **kwargs):
      key = kwargs['shard_key']
      del kwargs['shard_key']

      retval = f(*args, **kwargs)

      decriment(key)
      if get_count(key) == 0:
        callback()

      return retval
    return snd
  return fst

def func(n):
  # do some work
  time.sleep(random.randint(1, 10) / 10.0)
  logging.info('task #{:d}'.format(n))

def make_some_tasks():
  func = worker(callback=done_work)(func)
  key = random.randint(0, 1000)
  for n in xrange(0, 100):
    increment(key)
    deferred.defer(func, n, shard_key=key)
daniel
  • 393
  • 1
  • 3
  • 10
0

Tasks are not guaranteed to run only once, occasionally even successfully executed tasks may be repeated. Here's such an example: GAE deferred task retried due to "instance unavailable" despite having already succeeded.

Because of this using a counter incremented at task enqueueing and decremented at task completion wouldn't work - it would be decremented twice in such a duplicate execution case, throwing the whole computation off.

The only reliable way of keeping track of task completion (that I can think of) is to independently track each individual enqueued task. You can do that using the task names (either specified or auto-assigned after successful enqueueing) - they are unique for a given queue. Task names to be tracked can be kept in task lists persisted in the datastore, for example.

Note: this is just the theoretical answer I got to when I asked myself the same question, I didn't get to actually test it.

Community
  • 1
  • 1
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97