3

I have an app where I am creating a large number of entities. I don't want to put them in the same entity group, because I could be creating a lot of them in a short period of time -- say 1 million in 24 hours.

At certain points, I want to get all of these entities with a query like this:

Foo.all()

How long do I need to wait after the last Foo entity is created to be highly likely to get all of the Foo entities with this query?

EDIT:

From this question, it seems that I can't get all my entities right away. Would be great to know how long I should wait.

Community
  • 1
  • 1
new name
  • 15,861
  • 19
  • 68
  • 114
  • 2
    the entities should be exist very fast, but creating a large number of entities will be really expansive. – lucemia Mar 07 '13 at 01:32
  • and also actually retrieving then you will run out of time. You won't be able to get all of them, but you will be able to filter them and extract whatever you want. – Lipis Mar 07 '13 at 01:41
  • @Lipis, I would iterate over them in a task or backend so I don't think time is a problem. – new name Mar 07 '13 at 01:49

1 Answers1

4

Other than being quite expensive the whole thing, you will be able to get all your entities right away.

Note that iterating through millions of entities will require to use Tasks and if that is not enough, since they have a deadline of 10 minutes, you should consider using Backends.

Lipis
  • 21,388
  • 20
  • 94
  • 121
  • So it sounds like for practical purposes for the above problem, GAE behavior is more or less strongly consistent. I'd appreciate it if you could point me to a resource that explains your answer. – new name Mar 07 '13 at 02:02
  • While in my personal experience I never had 1M writes per day but around 100K wasn't a problem. Also there is nowhere in the documentation that points out that there is any latency for reading the data. This post might also be useful for your decisions: http://bjk5.com/post/30813320623/what-traffic-from-60-minutes-looks-like – Lipis Mar 07 '13 at 02:16
  • @Kekito While this link is not mentioning anything with writes but simply serve the app, you can still see the payload that Google App Engine can handle: http://googleappengine.blogspot.dk/2011/05/royal-wedding-bells-in-cloud.html So 1M writes per day shouldn't be a problem :) – Lipis Mar 07 '13 at 02:19
  • With just 1 million writes spread over 24 hours you probably won't run into consistency issues at all. From my observations most index updates complete within milliseconds while a small portion like 0.01% remains inconstistent for a couple minutes. Under heavy load however (1,000,000 updates within 20 minutes) I've seen inconsistencies for up to 4 hours. – as5wolf Mar 07 '13 at 09:16
  • To minimize inconsistencies (and other load dependend issues) avoid creating hotspots in the underlying big table. That is, do not use (auto-generated) ascending keys and do not index on sequential (monotonic) data like timestamp or counters. – as5wolf Mar 07 '13 at 09:34
  • Lipis, seems that your answer is a bit optimistic. Please see the edit to my question. – new name Mar 17 '13 at 23:13