3

I want to build a backend for a mobile game that includes a "real-time" global leaderboard for all players, for events that last a certain number of days, using Google App Engine (Python).

A typical usage would be as follows: - User starts and finishes a combat, acquiring points (2-5 mins for a combat) - Points are accumulated in the player's account for the duration of the event. - Player can check the leaderboard anytime. - Leaderboard will return top 10 players, along with 5 players just above and below the player's score.

Now, there is no real constraint on the real-time aspect, the board could be updated every 30 seconds, to every hour. I would like for it to be as "fast" as possible, without costing too much.

Since I'm not very familiar with GAE, this is the solution I've thought of:

  • Each Player entity has a event_points attribute
  • Using a Cron job, at a regular interval, a query is made to the datastore for all players whose score is not zero. The query is sorted.
  • The cron job then iterates through the query results, writing back the rank in each Player entity.

When I think of this solution, it feels very "brute force".

The problem with this solution lies with the cost of reads and writes for all entities. If we end up with 50K active users, this would mean a sorted query of 50K+1 reads, and 50k+1 writes at regular intervals, which could be very expensive (depending on the interval)

I know that memcache can be a way to prevent some reads and some writes, but if some entities are not in memcache, does it make sense to query it at all? Also, I've read that memcache can be flushed at any time anyway, so unless there is a way to "back it up" cheaply, it seems like a dangerous use, since the data is relatively important.

Is there a simpler way to solve this problem?

  • Whoever voted to close this question does not understand the meaning of "opinion-based". The answer I provided is based on App Engine's documentation, not my personal preference. Obviously, there are often more than one way to solve a problem, but this is a very good question with a specific, well-described problem. – Andrei Volgin Feb 25 '14 at 02:20

3 Answers3

2

You don't need 50,000 reads or 50,000 writes. The solution is to set a sorting order on your points property. Every time you update it, the datastore will update its order automatically, which means that you don't need a rank property in addition to the points property. And you don't need a cron job, accordingly.

Then, when you need to retrieve a leader board, you run two queries: one for 6 entities with more or equal number of points with your user; second - for 6 entities with less or equal number of points. Merge the results, and this is what you want to show to your user.

As for your top 10 query, you may want to put its results in Memcache with an expiration time of, say, 5 minutes. When you need it, you first check Memcache. If not found, run a query and update the Memcache.

EDIT:

To clarify the query part. You need to set the right combination of a sort order and inequality filter to get the results that you want. According to App Engine documentation, the query is performed in the following order:

  1. Identifies the index corresponding to the query's kind, filter properties, filter operators, and sort orders.
  2. Scans from the beginning of the index to the first entity that meets all of the query's filter conditions.
  3. Continues scanning the index, returning each entity in turn, until it encounters an entity that does not meet the filter conditions, or reaches the end of the index, or has collected the maximum number of results requested by the query.

Therefore, you need to combine ASCENDING order with GREATER_THAN_OR_EQUAL filter for one query, and DESCENDING order with LESS_THAN_OR_EQUAL filter for the other query. In both cases you set the limit on the results to retrieve at 6.

One more note: you set a limit at 6 entities, because both queries will return the user itself. You can add another filter (userId NOT_EQUAL to your user's id), but I would not recommend it - the cost is not worth the savings. Obviously, you cannot use GREATER_THAN/LESS_THAN filters for points, because many users may have the same number of points.

Andrei Volgin
  • 40,755
  • 6
  • 49
  • 58
  • The second query in datastore is quite expensive as it requires the cursor to traverse through first N entities so he might end up doing 50000 reads per request. It may not be a bad idea to maintain a table like table[x]=[points], x=1-1000,points=list of points in the range. This can be done by cron job and requires O(N) operation. This can be scaled pretty well. For small number of points you can simply keep a pickled map, for bigger number you can create a dedicated entity for this. – specialscope Feb 24 '14 at 07:13
  • Of course not! There is no need to use a cursor. And there is definitely no need for any special tables. All the data is already available in the index. – Andrei Volgin Feb 24 '14 at 08:15
  • This is a scan of an index in a datastore, not of entities themselves. You don't pay for it. You only pay for entities retrieved. – Andrei Volgin Feb 24 '14 at 09:10
  • Thanks for the answer, I'll definitely be looking into this strategy in depth! – user3345143 Feb 25 '14 at 01:56
  • Wondering how would one reset the leader-board when there is a new tournament ? All the points needs to be made zero. – Neil May 08 '14 at 10:51
  • @NeilGhosh I would set that property as a member of a child entity of player – Mazyod Jul 29 '14 at 17:43
  • @Mazyod by saying "that property" do you mean the score and there would be a separate child entity with TourneyID as key and score as value? or this new entity should be embedded into Player profile entity ? In later case google may now allow data more than 10K per entity – Neil Jul 30 '14 at 04:15
  • @NeilGhosh "that property" is the score. As for the solution, I meant the former. This way, you can also keep a history of all the tournament outcomes. I guess you can also do the latter using a repeatable property of strings with a certain format `"tournamentid:score"`. It's not a hack, even webapp2 uses this for auth_ids `own:username`, `google:username`, ... etc. – Mazyod Jul 30 '14 at 06:46
1

Here is a Google Developer article explaining similar problem and the solution using the Google code JAM ranking library. Further help and extension to this library can be discussed in the Google groups forum.

The library basically creates a N-ary tree with each node containing the count of the scores in a particular range. The score ranges are further divided all the way down till leaf node where its a single score . A tree traversal ( O log(n) ) can be used to find the number of players with score higher than the specific score. That is the rank of the player. It also suggests to aggregate the score submission requests in a pull taskqueue and then process them in a batch in a background thread in a backend.

Neil
  • 5,919
  • 15
  • 58
  • 85
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Mike Precup Aug 19 '14 at 18:29
  • @MikePrecup added brief explanation – Neil Nov 06 '14 at 09:09
0

Whether this is simpler or not is debatable.

I have assumed that ranking is not just a matter of ordering an accumulation of points, in which case thats just a simple query. I ranking involves other factors rather than just current score.

I would consider writing out an Event record for each update of points for a User (effectively a queue) . Tasks run collecting all the current Event records, In addition you maintain a set of records representing the top of the leaderboard. Adjust this set of records, based on the incoming event records. Discard event records once processed. This will limit your reads and writes to only active events in a small time window. The leader board could probably be a single entity, and fetched by key and cached.

I assume you may have different ranking schemes like current active rank (for the current 7 days), vs all time ranks. (ie players not playing for a while won't have a good current rank).

As the players view their rank, you can do that with two simple queries Players.query(Players.score > somescore).fetch(5) and Players.query(Players.score < somescore).fetch(5) this shouldn't cost too much and you could cache them.

Tim Hoffman
  • 12,976
  • 1
  • 17
  • 29