In my Google App Engine App, I have a large number of entities representing people. At certain times, I want to process these entities, and it is really important that I have the most up to date data. There are far too many to put them in the same entity group or do a cross-group transaction.
As a solution, I am considering storing a list of keys in Google Cloud Storage. I actually use the person's email address as the key name so I can store a list of email addresses in a text file.
When I want to process all of the entities, I can do the following:
- Read the file from Google Cloud Storage
- Iterate over the file in batches (say 100)
- Use
ndb.get_multi()
to get the entities (this will always give the most recent data) - Process the entities
- Repeat with next batch until done
Are there any problems with this process or is there a better way to do it?