0

I have the following piece of code in a method, which is decorated as transactional:

@ndb.transactional(retries=2, xg=True)
def add_insights(self, insights, countdown = True, forced_finish = False):
            ...
    thing = tkey.get() ###############
    logging.debug(thing.open_enrichments)
    thing.insights += insight_keys
    if countdown and not forced_finish:
        thing.open_enrichments -= 1
        if thing.open_enrichments < 0:
            thing.open_enrichments = 0
    elif forced_finish:
        thing.open_enrichments = 0
    logging.debug(thing.open_enrichments)
    thing.put() #########################

This method is run in many concurrent tasks, which might access the same "thing" entity in the NDB. When I check the log files (debug statements simplified for clarity here) it seems that even if this code fails in one task, another task might still start with the decremented counter "open_enrichments" from the failed task.

I verified this with sorting the debug statements by time stamp.

The counter reaches 0 much too quickly because of this issue, of course. open_enrichments is initially set to 8 but gets (effective for other tasks reading the counter with a key.get()) decremented 12 or 13 times, which I don't understand from what I learned about transactions in NDB.

EDIT:

To clarify the sequence:

  • Task A enters this piece of code with open_enrichments = 5
  • Task A leaves this piece of code with open_enrichments = 4 and fails, because in the meantime
  • Task B entered this piece of code with open_enrichments = 4 (not 5) !!!! it seems the >>thing = tkey.get()<< resulted in a changed entity already
  • Task B leaves this piece of code with open_enrichments = 3 and commits successfully
  • Task A re-enters this piece of code with open_enrichments = 3
  • Task A leaves this piece of code with open_enrichments = 2 and commits

So the two tasks have run successfully only twice but the counter is decremented by 3 !!

Markus Breuer
  • 117
  • 1
  • 6
  • Is the task failing, and retrying, because of something outside the transaction? – Greg Jan 11 '14 at 10:38
  • It is not the task, which fails, but the method (because one instance of the transactional method 'collides' with another instance in another task). So the method is retried - up to 2 times as the @decorator specifies. – Markus Breuer Jan 11 '14 at 11:11
  • I think you need to use parent/ancestor for strong consistency: https://developers.google.com/appengine/docs/python/datastore/structuring_for_strong_consistency – jacek2v Jan 11 '14 at 21:18
  • Well, if I use this approach ... all entities (rows) in this group fall under the rule of "less than 1 write ops per second", which is a rather strict limitation, if you have a few hundred entities you are handling here with concurrent tasks. – Markus Breuer Jan 16 '14 at 09:15
  • How do you know that task A is failing? The log entries wont tell you, the timing wont be consistent. ie: if Task A enters and leaves, then Task B enters and leaves, the log entries might go Task A enters, Task B enters, Task A leaves, Task B leaves. – Emlyn O'Regan Feb 22 '17 at 07:08

1 Answers1

0

Appengine transactions can actually be committed but still fail, and be retried (3 times by default). See the first note on this page.

So it is really important that your transaction is idempotent. Can you set open_enrichments to a given number instead of decrementing in the transaction?

FoxyLad
  • 1,616
  • 11
  • 12