0

Our GAE app makes a local copy of another website's relational database in the NDB. There are 4 entity types - User, Table, Row, Field. Each user has a bunch of tables, each table has a bunch of rows, each row has a bunch of fields.

SomeUser > SomeTable > ARow > AField

Thus, each User becomes one entity group. I need a feature where I can clear out all the tables (and their rows) for a certain user. What is the right way to delete all the tables and all the rows, while avoiding the contention limit of ~5 operations/second.

The current code is getting TransactionFailedErrors because of contention on the Entity Group. (detail that I overlooked is that we only want to delete tables with the attribute 'service' set to a certain value)

def delete_tables_for_service(user, service):
    tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
    for table in tables:
        keys = []
        keys += Fields.query(ancestor=table).fetch(keys_only=True)
        keys += TableRows.query(ancestor=table).fetch(keys_only=True)
        keys.append(table)
        ndb.delete_multi(keys)
Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
Neil
  • 3,100
  • 5
  • 29
  • 36
  • 1
    It looks like you're deleting each Table entity (and associated child entities) separately. Without seeing the context in your code (and not being familiar with transactions in Python) I can't tell if this is a transaction per table. Have you tried moving your `ndb.delete_multi(keys)` outside the `for` loop so it does one delete for all tables? Otherwise, for a User with 10 Tables it looks like you're possibly trying 10 transactions. – tx802 Sep 15 '14 at 16:29

2 Answers2

1

If all of the entities you're deleting are in one entity group, try deleting them all in one transaction. Without an explicit transaction, each delete is occurring in its own transaction, and all of the transactions have to line up (via contention and retries) to change the entity group.

Dan Sanderson
  • 2,111
  • 12
  • 12
  • This is the fix I suspect will work, but semantically it makes no sense to me, as I don't need the deletion of one row to be dependent on the deletion of another. If either one goes through, that should be committed. Grouping them together in a transaction just to get around the contention limit feels like a hack. – Neil Sep 16 '14 at 10:37
  • It's better to visualize that the "row" in the datastore is actually the entity group, not the entity. With all the entities in one group, you're actually modifying one row multiple times. Putting it all in a transaction simply does all the group updates at once, which is what you intend in this case. – Dan Sanderson Sep 16 '14 at 20:33
  • (This is slightly confusing because entities of kinds, not entity groups, are the results of queries. This is because queries act on index rows, not the original entity group records.) – Dan Sanderson Sep 16 '14 at 20:36
  • Just attempting the same Task (which calls the delete tables function) repeatedly, we somehow managed to get the task done. Marking this as accepted answer, will test out at a later stage whether it does improve performance. – Neil Sep 17 '14 at 15:35
0

Are you sure it's contention-based, or perhaps because the code above is executed within a transaction? A quick fix might be to increase the number of retries and turn on cross-group transactions for this method:

@ndb.transactional(retries=5, xg=True)

You can read more about that here: https://developers.google.com/appengine/docs/python/ndb/transactions. If that's not the culprit, maybe consider deferring or running the deletes asynchronously so they execute over time and in smaller batches. The trick with NDB is to do small bursts of work regularly, versus a large chunk of work infrequently. Here is one way to turn that code into an asynchronous unit of work:

def delete_tables_for_service(user, service):
    tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
    for table in tables:
        # Delete fields
        fields_keys = Fields.query(ancestor=table).fetch(keys_only=True)
        ndb.delete_multi_async(fields_keys)

        # Delete table rows
        table_rows_keys = TableRows.query(ancestor=table).fetch(keys_only=True)
        ndb.delete_multi_async(table_rows_keys)

        # Finally delete table itself
        ndb.delete_async(table.key)

If you want more control over the deletes, retries, failures, you can either use Task Queues, or simply use the defer library (https://developers.google.com/appengine/articles/deferred):

  1. Turn deferred on in your app.yaml
  2. Change the calls to ndb.delete_multi to deferred:

    def delete_tables_for_service(user, service):
        tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
        for table in tables:
            keys = []
            keys += Fields.query(ancestor=table).fetch(keys_only=True)
            keys += TableRows.query(ancestor=table).fetch(keys_only=True)
            keys.append(table) 
            deferred.defer(_deferred_delete_tables_for_keys, keys)
    
    def _deferred_delete_tables_for_keys(keys):
        ndb.delete_multi(keys)
    
Bardia D.
  • 693
  • 3
  • 8