7

Is there a way to assure FIFO (first in, first out) behavior with Task Queues on GAE?

GAE Documentation says that FIFO is one of the factors that affect task execution order, but the same documentation says that “the system's scheduling may 'jump' new tasks to the head of the queue” and I have confirmed this behavior with a test. The effect: my events are being processed out of order.

Docs says:

https://developers.google.com/appengine/docs/java/taskqueue/overview-push

The order in which tasks are executed depends on several factors:

The position of the task in the queue. App Engine attempts to process tasks based on FIFO > (first in, first out) order. In general, tasks are inserted into the end of a queue, and executed from the head of the queue.

The backlog of tasks in the queue. The system attempts to deliver the lowest latency possible for any given task via specially optimized notifications to the scheduler. Thus, in the case that a queue has a large backlog of tasks, the system's scheduling may "jump" new tasks to the head of the queue.

The value of the task's etaMillis property. This property specifies the earliest time that a task can execute. App Engine always waits until after the specified ETA to process push tasks.

The value of the task's countdownMillis property. This property specifies the minimum number of seconds to wait before executing a task. Countdown and eta are mutually exclusive; if you specify one, do not specify the other.

What I need to do? In my use case, I'll process 1-2 million events/day coming from vehicles. These events can be sent at any interval (1 sec, 1 minute or 1 hour). The order of the event processing has to be assured. I need process by timestamp order, which is generated on a embedded device inside the vehicle.

What I have now?

  1. A Rest servlet that is called by the consumer and creates a Task (Event data is on payload).

  2. After this, a worker servlet get this Task and:

    • Deserialize Event data;

    • Put Event on Datastore;

    • Update Vehicle On Datastore.

So, again, is there any way to assure just FIFO behavior? Or how can I improve this solution to get this?

André Salvati
  • 504
  • 6
  • 17
  • 1
    Why do you need strict FIFO? Remember that order-of-events is kind of fuzzy in a distributed system - even telling which of several nearly simultaneous requests occurred first is difficult (and often pointless) when your frontends and backends are distributed across multiple machines. What are you trying to achieve, as an end result? – Nick Johnson Apr 05 '12 at 06:55
  • We track public buses, so we need to know when it stopped in a bus stop, or when it opened a trip, or when it exceeded the velocity limits. The problem is that some events are related to previous events because we also do state management. For example: a bus can only 'open a trip' if in the previous event it recorded a 'closed trip'. So, you can imagine what can happen if I get these events out of order... – André Salvati Apr 05 '12 at 11:14
  • A task queue doesn't sound like the best way to do this in any case. What you want to do is use datastore transactions, and restrict the set of state transitions for a given bus to only those that are allowed by your state machine. – Nick Johnson Apr 06 '12 at 00:20
  • And how can I stop receiving events in case of maintenance or an app update without task queues? – André Salvati Apr 09 '12 at 20:15
  • I don't understand what you're asking. How do you normally receive event updates? – Nick Johnson Apr 10 '12 at 04:52
  • I still don't receive event updates cause it's a new project. But I need pause and resume event processing in case of app upload or other maintenance. Or, I need robustness from async processing. – André Salvati Apr 10 '12 at 14:01
  • Pausing and resuming is a completely separate issue - no mattter how you do it, you'll need some way to stop receiving updates if you don't want them. I would worry about 'robustness from async processing' if and when it becomes a problem; currently your problem is getting it to work at all, and you're trying to press the task queue into a role it wasn't designed for. – Nick Johnson Apr 11 '12 at 01:07
  • Nick, I resolved the problem with push queues. Take a look below. Thanks. – André Salvati Apr 11 '12 at 11:27

7 Answers7

4

You need to approach this with three separate steps:

  1. Implement a Sharding Counter to generate a monotonically increasing ID. As much as I like to use the timestamp from Google's server to indicate task ordering, it appears that timestamps between GAE servers might vary more than what your requirement is.

  2. Add your tasks to a Pull Queue instead of a Push Queue. When constructing your TaskOption, add the ID obtained from Step #1 as a tag. After adding the task, store the ID somewhere on your datastore.

  3. Have your worker servlet lease Tasks by a certain tag from the Pull Queue. Query the datastore to get the earliest ID that you need to fetch, and use the ID as the lease tag. In this way, you can simulate FIFO behavior for your task queue.

After you finished your processing, delete the ID from your datastore, and don't forget to delete the Task from your Pull Queue too. Also, I would recommend you run your task consumptions on the Backend.

UPDATE: As noted by Nick Johnson and mjaggard, sharding in step #1 doesn't seem to be viable to generate a monotonically increasing IDs, and other sources of IDs would then be needed. I seem to recall you were using timestamps generated by your vehicles, would it be possible to use this in lieu of a monotonically increasing ID?

Regardless of the way to generate the IDs, the basic idea is to use datastore's query mechanism to produce a FIFO ordering of Tasks, and use task Tag to pull specific task from the TaskQueue.

There is a caveat, though. Due to the eventual consistency read policy on high-replication datastores, if you choose HRD as your datastore (and you should, the M/S is deprecated as of April 4th, 2012), there might be some stale data returned by the query on step #2.

Ibrahim Arief
  • 8,742
  • 6
  • 34
  • 54
  • Is it actually possible to create monotonically increasing ids using sharded counters? I thought not - the function for allocating ids in the low level api datastore can generate ids but might not be in strict order I suspect. – mjaggard Apr 04 '12 at 22:13
  • @mjaggard: Thanks, but I'm not quite sure I understand what you meant.. Sharded counters do not use the generate ID function from the low level datastore. Basically, we're the one that increments the counters ourselves, but we spread the "counter bucket" on several entities to reduce write contention. Thus, the counters would be monotonically increasing. Please cmiiw, of course. – Ibrahim Arief Apr 05 '12 at 06:27
  • 1
    Yes, you can't generate monotonically increasing IDs with a sharded counter - sharded counters span multiple entity groups, so there's no way to guarantee that any number you generate with one is unique. Sharded counters are for counting things, not for assigning IDs. – Nick Johnson Apr 05 '12 at 06:53
  • @NickJohnson: Thanks for the correction Nick, this comes as a surprise for me, since I use sharded counters in lieu of timestamp for temporal ordering of entities on my production app. I would probably open a new question regarding this to clarify the behavior. – Ibrahim Arief Apr 05 '12 at 07:02
  • @IbrahimArief Sharded counters have never been intended for assigning IDs. Why don't you just use a timestamp? – Nick Johnson Apr 05 '12 at 08:47
  • @NickJohnson: I thought timestamps in GAE instance servers are not synchronized well enough for time-critical use cases, am I wrong? As I understand it, desynchronizations in the order of a dozen seconds or more would be common, which is more than what could be tolerated in our use case. (And in the asker's use case too, it appears) – Ibrahim Arief Apr 05 '12 at 15:36
  • @IbrahimArief Depends what you mean by 'time critical'. Servers are synchronized using UTP, just like every other server on the planet. – Nick Johnson Apr 06 '12 at 00:15
3

Ok. This is how I've done it.

1) Rest servlet that is called from the consumer:

    If Event sequence doesn't match Vehicle sequence (from datastore)

        Creates a task on a "wait" queue to call me again

    else

       State validation

       Creates a task on the "regular" queue (Event data is on payload).


2) A worker servlet gets the task from the "regular" queue, and so on... (same pseudo code)

This way I can pause the "regular" queue in order to do a data maintenance without losing events.

Thank you for your answers. My solution is a mix of them.

André Salvati
  • 504
  • 6
  • 17
3

I think the simple answer is "no", however partly in order to help improve the situation, I am using a pull queue - pulling 1000 tasks at a time and then sorting them. If timing isn't important, you could sort them and put them into the datastore and then complete a batch at a time. You've still got to work out what to do with the tasks at the beginning and ends of the batch - because they might be out of order with interleaving tasks in other batches.

mjaggard
  • 2,389
  • 1
  • 23
  • 45
  • Timing is important. Forgot to tell that I need process by timestamp order, which is generated on a embedded device inside the vehicle. – André Salvati Apr 02 '12 at 23:06
0

The best way to handle this, the distributed way or "App Engine way" is probably to modify your algorithm and data collection to work with just a timestamp, allowing arbitrary ordering of tasks.

Assuming this is not possible or too difficult, you could modify your algorithm as follow:

  1. when creating the task don't put the data on payload but in the datastore, in a Kind with an ordering on timestamps and stored as a child entity of whatever entity you're trying to update (Vehicule?). The timestamps should come from the client, not the server, to guarantee the same ordering.

  2. run a generic task that fetch the data for the first timestamp, process it, and then delete it, inside a transaction.

Julian Go
  • 4,442
  • 3
  • 23
  • 28
0

Following this thread, I am unclear as to whether the strict FIFO requirement is for all transactions received, or on a per-vehicle basis. Latter has more options vs. former.

stevep
  • 959
  • 5
  • 8
0

You can put the work to be done in a row in the datastore with a create timestamp and then fetch work tasks by that timestamp, but if your tasks are being created too quickly you will run into latency issues.

Rick Mangi
  • 3,761
  • 1
  • 14
  • 17
0

Don't know the answer myself, but it may be possible that tasks enqueued using a deferred function might execute in order submitted. Likely you will need an engineer from G. to get an answer. Pull queues as suggested seem a good alternative, plus this would allow you to consider batching your put()s.

One note about sharded counters: they increase the probability of monotonically increasing ids, but do not guarantee them.

stevep
  • 959
  • 5
  • 8
  • deferred uses the task queue. – Nick Johnson Apr 05 '12 at 06:55
  • Right, just did not know if there might have been some magic about tasks entering the queue from a deferred function vs. standard enqueueing. Sounds like answer is "No" which is what I'd have guessed. – stevep Apr 05 '12 at 18:09