I have a datastore with around 1,000,000 entities in a model. I want to fetch 10 random entities from this.
I am not sure how to do this? can someone help?
I have a datastore with around 1,000,000 entities in a model. I want to fetch 10 random entities from this.
I am not sure how to do this? can someone help?
Assign each entity a random number and store it in the entity. Then query for ten records whose random number is greater than (or less than) some other random number.
You'll also need to sort on your random number column, otherwise, Google App Engine will pick 10 entries that are greater (or less than) your number, but it will pick them in a non random way. So, if you are picking records whose random number is greater than a random number, you'd sort asending on the column, otherwise you'd sort decending.
This isn't totally random, however, since entities with nearby random numbers will tend to show up together. If you want to beat this, do ten queries based around ten random numbers, but this will be less efficient.
Jason Hall's answer and the one here aren't horrible, but as he mentions, they are not really random either. Even doing ten queries will not be random if, for example, the random numbers are all grouped together. To keep things truly random, here are two possible solutions:
Solution 1
Assign an index to each datastore object, keep track of the maximum index, and randomly select an index every time you want to get a random record:
MyObject.objects.filter('index =', random.randrange(0, maxindex+1))
Upside: Truly random. Fast.
Down-side: You have to properly maintain indices when adding and deleting objects, which can make both operations a O(N) operation.
Solution 2
Assign a random number to each datastore number when it is created. Then, to get a random record the first time, query for a record with a random number greater than some other random number and order by the random numbers (i.e. MyObject.order('rand_num').filter('rand_num >=', random.random())
). Then save that query as a cursor in the memcache. To get a random record after the first time, load the cursor from the memcache and go to the next item. If there is no item after the first, run the query again.
To prevent the sequence of objects from repeating, on every datastore read, give the entity you just read a new random number and save it back to the datastore.
Up-side: Truly random. No complex indices to maintain.
Down-side: Need to keep track of a cursor. Need to do a put every time you get a random record.