0

I have a Java App Engine project and I am using DeferredTasks for push queues.

/** A hypothetical expensive operation we want to defer on a background task. */
public static class ExpensiveOperation implements DeferredTask {

  @Override
  public void run() {
    System.out.println("Doing an expensive operation...");
    // expensive operation to be backgrounded goes here
  }
}

I want to be able to create multiple shards of a DeferredTask to be able to have more through-put. Basically, I want to run one DeferredTask that then runs many more DeferredTasks (up to 1,000 of them). Essentially a fan-out task. How can I do that?

One issue is that when creating tasks you need to specify the name of them in the queue.yaml file. But if I want to have 1,000 tasks, do I really need to specify 1,000 of them in that file? It would get very tedious to write out "task-1", "task-2", etc.

Is there a better way to do this?

BlueBoy
  • 798
  • 1
  • 11
  • 22

1 Answers1

1

This is usually done by specifying a shard parameter for each task and reusing the same queue. As noted in your example, the entire java object is serialized with DeferredTask. So you can simply pass in any values you want in a constructor. E.g.

public static class ShardedOperation implements DeferredTask {
  private final int shard;
  public ShardedOperation(int shard) {
    this.shard = shard;
  }
}
...
@Override
public void run() {
  System.out.println("Fanning out an expensive operation...");
  Queue queue = QueueFactory.getDefaultQueue();
  for (int i = 0; i < 1000; ++i) {
    queue.add(TaskOptions.Builder.withPayload(new ShardedOperation(i)));
  }
}

This matches the section you linked to https://cloud.google.com/appengine/docs/standard/java/taskqueue/push/creating-tasks#using_the_instead_of_a_worker_service where the default queue is used.

Jim Morrison
  • 2,784
  • 1
  • 7
  • 11
  • If I had to loop through say, 100,000 entities, what would be a good way to to make each sharded task read and update those entities? So with 1000 shards, each shard would read and write 1000 entities. – BlueBoy Jan 23 '22 at 04:50
  • You could do a key only query to fetch the 100k entity keys and pass 100 keys to each worker. Or you could pass start and end cursors to each worker instead of individual keys. – Jim Morrison Jan 24 '22 at 05:44
  • Is querying 100k entity keys into memory efficient though? Would there be any issues querying that many? Seems like a lot. But if it can handle it, that is a good suggestion. On that note, how many keys do you think an application can query and load into memory before it becomes too much? – BlueBoy Jan 24 '22 at 07:35
  • If your keys are only 100 bytes, then 100k keys are only 10MB of data. If you are concerned of this scale, you should consider creating a dataflow job (https://cloud.google.com/dataflow/docs) from your deferred task handler. – Jim Morrison Jan 25 '22 at 03:52