0

Every Cloud Datastore query computes its results using one or more indexes, which contain entity keys in a sequence specified by the index's properties and, optionally, the entity's ancestors. The indexes are updated incrementally to reflect any changes the application makes to its entities, so that the correct results of all queries are available with no further computation needed.

Generally, I would like to know if

datastore.get(List<Key> listOfKeys);

is faster or slower than a query with the index file prepared (with the same results).

Query q = new Query("Kind")(.setFilter(someFilter));

My current problem:

My data consists of Layers and Points. Points belong to only one unique layer and have unique ids within a layer. I could load the points in several ways:

1) Have points with a "layer name" property and query with a filter. - Here I am not sure whether the datastore would have the results prepared because as the layer name changes dynamically.

2) Use only keys. The layer would have to store point ids.

KeyFactory.createKey("Layer", "layer name");
KeyFactory.createKey("Point", "layer name"+"x"+"point id");

3) Use queries without filters: I don't actually need the general kind "Point" and could be more specific: kind would be ("layer name"+"point id") - What are the costs to creating more kinds? Could this be the fastest way?

Can you actually find out how the datastore works in detail?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
thehorseisbrown
  • 390
  • 2
  • 14

1 Answers1

1

faster or slower than a query with the index file prepared (with the same results).

Fundamentally a query and a get by key are not guaranteed to have the same results.

Queries are eventually consistent, while getting data by key is strongly consistent.

Your first challenge, before optimizing for speed, is probably ensuring that you're showing the correct data.

The docs are good for explaining eventual vs strong consistency, it sounds like you have the option of using an ancestor query which can be strongly consistent. I would also strongly recommend avoiding using the 'name' - which is dynamic - as the entity name, this will cause you an excessive amount of grief.

Edit: In the interests of being specifically helpful, one option for a working solution based on your description would be:

  1. Give a unique id (a uuid probably) to each layer, store the name as a property
  2. Include the layer key as the parent key for each point entity
  3. Use an ancestor query when fetching points for a layer (which is strongly consistent)

An alternative option is to store points as embedded entities and only have one entity for the whole layer - depends on what you're trying to achieve.

Nick
  • 1,822
  • 10
  • 9
  • Thank you for pointing out the problem of consistency. I should have included that my layers would be edited and displayed to just a handful of users... And why should I worry about the 'name' as entity name? I would prevent people from saving a layer with an already used name... I can then recreate the keys very easily. – thehorseisbrown Oct 27 '16 at 20:14
  • Eventual consistency is a design constraint, not a function of the number of users. To understand the implication, deploy to app engine, save and reload. You almost definitely won't see the points you just saved. You will need to solve this problem. – Nick Oct 27 '16 at 21:09
  • If you use name as the entity name, rename is a put for the new record, a delete for the old one, and you will have to update every entity that references the layer. It's the same as using the name as the pk in a relational database - it impacts all logically connected entities. If you then try to do that transactionally you have a limit of 25 entities per XG transaction, so with this model you would have an issue if you had more than 23 points in a layer. I promise you this is a level of pain you just don't want to deal with. – Nick Oct 27 '16 at 21:11
  • oh I see.. it is really dumb to use the name :D thank you for explaining why! – thehorseisbrown Oct 27 '16 at 21:26