GAE transaction failure and idempotency

Question

The Google App Engine documentation contains this paragraph:

Note: If your application receives an exception when committing a transaction, it does not always mean that the transaction failed. You can receive DatastoreTimeoutException, ConcurrentModificationException, or DatastoreFailureException exceptions in cases where transactions have been committed and eventually will be applied successfully. Whenever possible, make your Datastore transactions idempotent so that if you repeat a transaction, the end result will be the same.

Wait, what? It seems like there's a very important class of transactions that just simply cannot be made idempotent because they depend on current datastore state. For example, a simple counter, as in a like button. The transaction needs to read the current count, increment it, and write out the count again. If the transaction appears to "fail" but doesn't REALLY fail, and there's no way for me to tell that on the client side, then I need to try again, which will result in one click generating two "likes." Surely there is some way to prevent this with GAE?

Edit:

it seems that this is problem inherent in distributed systems, as per non other than Guido van Rossum -- see this link:

app engine datastore transaction exception

So it looks like designing idempotent transactions is pretty much a must if you want a high degree of reliability.

I was wondering if it was possible to implement a global system across a whole app for ensuring idempotency. The key would be to maintain a transaction log in the datastore. The client would generated a GUID, and then include that GUID with the request (the same GUID would be re-sent on retries for the same request). On the server, at the start of each transaction, it would look in the datastore for a record in the Transactions entity group with that ID. If it found it, then this is a repeated transaction, so it would return without doing anything.

Of course this would require enabling cross-group transactions, or having a separate transaction log as a child of each entity group. Also there would be a performance hit if failed entity key lookups are slow, because almost every transaction would include a failed lookup, because most GUIDs would be new.

In terms of the additional $ cost in terms of additional datastore interactions, this would probably still be less than if I had to make every transaction idempotent, since that would require a lot of checking what's in the datastore in each level.

Have a read of Nick Johnsons article on distributed transactions - http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine — Tim Hoffman, Jul 18 '13 at 14:40
That's very interesting. I'm trying to think about how to apply that technique to the task of creating a reliable counter. It's easy enough if only one user has access to the counter: presumably the client knows the current value of the counter, so simply send the expected next value of the counter to the DB, rather than sending a message saying "increment it." However the stumbling block I'm running into is this: how would you implement this if multiple users could (possibly concurrently) increment the counter. It seems like there ought to be a way that doesn't involve keeping a log. — eeeeaaii, Jul 18 '13 at 16:19
There is lots of discussion about counters in the datastore, what you will find if you have a lot of concurrent high frequency updates you may find you will need to shard counters to get throughput. — Tim Hoffman, Jul 18 '13 at 16:32
From my experience, there is a always a way to make an operation idempotent. From your example, the operation can be idempotent because a user can like only once a post. — Tony Baguette, Jul 27 '13 at 17:01
Last comment is not true. If datastore returns error then the app should retry the write. Thus you can end up with two writes. — Zig Mandel, Feb 26 '14 at 20:06

score 7 · Accepted Answer · edited Apr 01 '14 at 08:56

dan wilkerson, simon goldsmith, et al. designed a thorough global transaction system on top of app engine's local (per entity group) transactions. at a high level, it uses techniques similar to the GUID one you describe. dan dealt with "submarine writes," ie the transactions you describe that report failure but later surface as succeeded, as well as many other theoretical and practical details of the datastore. erick armbrust implemented dan's design in tapioca-orm.

i don't necessarily recommend that you implement his design or use tapioca-orm, but you'd definitely be interested in the research.

in response to your questions: plenty of people implement GAE apps that use the datastore without idempotency. it's only important when you need transactions with certain kinds of guarantees like the ones you describe. it's definitely important to understand when you do need them, but you often don't.

the datastore is implemented on top of megastore, which is described in depth in this paper. in short, it uses multi-version concurrency control within each entity group and Paxos for replication across datacenters, both of which can contribute to submarine writes. i don't know if there are public numbers on submarine write frequency in the datastore, but if there are, searches with these terms and on the datastore mailing lists should find them.

amazon's S3 isn't really a comparable system; it's more of a CDN than a distributed database. amazon's SimpleDB is comparable. it originally only provided eventual consistency, and eventually added a very limited kind of transactions they call conditional writes, but it doesn't have true transactions. other NoSQL databases (redis, mongo, couchdb, etc.) have different variations on transactions and consistency.

basically, there's always a tradeoff in distributed databases between scale, transaction breadth, and strength of consistency guarantees. this is best known by eric brewer's CAP theorem, which says the three axes of the tradeoff are consistency, availability, and partition tolerance.

So what is interesting is that the paper talks about a submarine write being a situation where a write happens, but reads return stale data. To me that seems like less of a concern. The more problematic issue is that the App Engine documentation indicates that in the case of a submarine write, an exception is thrown, making it so that the client will think it has to retry. — eeeeaaii, Jul 22 '13 at 03:02
another thing that is interesting is that this paper seems to contradict what Guido van Rossum says in the link above -- submarine writes in particular seem to be specific to app engine, and he says specifically it is an optimization decision made by the app engine team. So, in general, the CAP theorem sure, but submarine writes are specifically app engine's issue. He also makes the important point that transaction order is never compromised, so if redoing a transaction would cause an error, you shouldn't have a problem (seems like a shortcut around true idempotency). — eeeeaaii, Jul 22 '13 at 10:35
To clarify the second comment: let's say you're creating a record and the record's key is fully determined by information sent by the client (for example, a registration page would send a username which becomes the key). Assume a submarine write and a spurious error of type "retry" (such as ConcurrentModificationException). The client retries, the record already exists, a different error is thrown which is NOT of type retry, and the user sees an error but is in fact registered. Not the most user-friendly result but at least your data isn't corrupted, and submarines are rare, right? — eeeeaaii, Jul 22 '13 at 10:41
you're right, i was conflating different issues. submarine writes can happen when a commit returns either success or failure, but iirc they're generally an artifact of paxos-based replication. you and guido are probably talking about a different kind of failure mode. and yes, both kinds should generally be very rare, although maybe occasionally bursty, so guido's idempotency shortcut is a reasonable technique. — ryan, Jul 22 '13 at 19:40

score 1 · Answer 2 · answered May 01 '14 at 21:05

The best way I came up with making counters idempotent is using a set instead of an integer in order to count. Thus, when a person "likes" something, instead of incrementing a counter I add the like to the thing like this:

class Thing {
Set<User> likes = ....

public void like (User u) {
  likes.add(u);
}
public Integer getLikeCount() {
  return likes.size();
}
}

this is in java, but i hope you get my point even if you are using python.

This method is idempotent and you can add a single user for how many times you like, it will only be counted once. Of course, it has the penalty of storing a huge set instead of a simple counter. But hey, don't you need to keep track of likes anyway? If you don't want to bloat the Thing object, create another object ThingLikes, and cache the like count on the Thing object.

score 0 · Answer 3 · edited May 23 '17 at 10:30

0

another option worth looking into is app engine's built in cross-group transaction support, which lets you operate on up to five entity groups in a single datastore transaction.

if you prefer reading on stack overflow, this SO question has more details.

edited May 23 '17 at 10:30

Community

1
1

answered Feb 26 '14 at 20:50

ryan

2,687
1
29
38

GAE transaction failure and idempotency

3 Answers3

Linked