4

I would like to sync my Cloud Datastore contents with an index in ElasticSearch. I would like for the ES index to always be up to date with the contents of Datastore.

I noticed that an equivalent mechanism is available in the Appengine Python Standard Environment by implementing a _post_put_hook method in a Datastore Model. This doesn't seem to be possible however using the google-cloud-datastore library available for use in the flex environment.

Is there any way to receive a callback after every insert? Or will I have to put up a "proxy" API in front of the datastore API which will update my ES index after every insert/delete?

Chaitanya Nettem
  • 1,209
  • 2
  • 23
  • 45

1 Answers1

4

The _post_put_hook() of NDB.Model does only work if you have written the entity through NDB to Datastore, and yes, unfortunately the NDB library is only available in App Engine Python Standard Environment. I don't know of such feature in Cloud Datastore. If I remember correctly, Firebase Realtime Database or Firestore have triggers for writes, but I guess you are not eager to migrate the database neither.

In Datastore you would either need a "proxy" API with the above method as you suggested, or you would need to modify your Datastore client(s) to do this upon any successful write op. The latter may come with higher risk of fails and stale data in ElasticSearch, especially if the client is outside your control.

I believe that a custom API makes sense if consistent and up-to-date search records is important for your use-cases. Datastore and Python / NDB (maybe with Cloud Endpoints) would be a good approach.

I have a similar solution running on GAE Python Standard (although with the builtin Search API instead of ElasticSearch). If you choose this route you should be aware of two potential caveats:

  1. _post_put_hook() is always called, even if the put operation failed. I have added a code sample below. You can find more details in the docs: model hooks, hook methods, check_success()

  2. Exporting the data to ElasticSearch or Search API will prolong your response time. This might be no issue for background tasks, just call the export feature inside _post_put_hook(). But if a user made the request, this could be a problem. For these cases, you can defer the export operation to a different thread, either by using the deferred.defer() method or by creating a push task). More or less, they are the same. Below, I use defer().

  3. Add a class method for every kind of which you want to export search records. Whenever something went wrong or you move apps / datastores, add new search indexes etc. you can call this method that will then query all entities of that kind from datastore batch by batch, and export the search records.

Example with deferred export:

class CustomModel(ndb.Model):
    def _post_put_hook(self, future):
        try:
            if future.check_success() is None:
                deferred.defer(export_to_search, self.key)
        except:
            pass  # or log error to Cloud Console with logging.error('blah')

def export_to_search(key=None):
    try:
        if key is not None:
            entity = key.get()
            if entity is not None:
                call_export_api(entity)
    except:
        pass  # or log error to Cloud Console with logging.error('blah')

```

Ani
  • 1,377
  • 1
  • 18
  • 29