4

I'm building an app on GAE that needs to report on events occurring. An event has a type and I also need to report by event type.

For example, say there is an event A, B and C. They occur periodically at random. User logs in and creates a set of entities to which those events can be attributed. When the user comes back to check the status, I need to be able to tell how many events of A, B and/or C occurred during a specific time range, say a day or a month.

The 1000 limit is throwing a wrench into how I would normally do it. I don't need to retrieve all of the entities and present them to the user, but I do need to show the total count for a specific date range. Any suggestions?

I'm a bit of python/GAE noob...

David Underhill
  • 15,896
  • 7
  • 53
  • 61
Linsidious
  • 41
  • 1
  • 2
  • Possible duplicate of http://stackoverflow.com/questions/993059/is-there-a-way-to-do-aggregate-functions-on-google-app-engine – rds Nov 29 '12 at 21:35

6 Answers6

7

App Engine is not a relational database and you won't be able to quickly do counts on the fly like this. The best approach is to update the counts at write time, not generate them at read time.

When generating counts, there are two general approaches that scale well with App Engine to minimize write contention:

  1. Store the count in Memcache or local memory and periodically flush. This is the simplest solution, but it can be volatile and data loss is probable.
  2. Use a Sharded Counter. This approach is a bit more reliable but more complex. You won't be able to sort easily by count, but you could also periodically flush to another indexed count field periodically and sort by that.
Ikai Lan
  • 2,210
  • 12
  • 13
  • That really won't work in this scenario - data is too granular to be able to effectively shard. I also need to keep historic records for a while. But thank you your time! – Linsidious Feb 04 '10 at 20:36
  • Read Ikai's link - sharded counters don't rely on sharding the data, but rather on sharding the counter itself. – Nick Johnson Feb 08 '10 at 10:44
3

Results of datastore count() queries and offsets for all datastore queries are no longer capped at 1000.

Since Version 1.3.6

systempuntoout
  • 71,966
  • 47
  • 171
  • 241
0

My approach would be to have an aggregate model or models to keep track of event types, dates and counts. I'm not 100% how you should model this given your requirements, though.

Then, I'd fire off deferred tasks to asynchronously update the appropriate aggregate models whenever a user does something that triggers an event.

Nick Johnson's Background work with the deferred library article has much more information, and provides a framework that you might find useful for doing the kind of aggregation you're talking about.

Will McCutchen
  • 13,047
  • 3
  • 44
  • 43
  • Thank you! I'll definitely check it out. Main issue is that aggregate counts do not work well with reporting needs, especially when filtering data. As previous answer mentioned, a relational db would be ideal, but I'm thinking there has to be a way to accomplish something similar. – Linsidious Feb 04 '10 at 20:38
  • I also forgot to add a bit about averages. One way I was thinking this could be accomplished is to use a loop and "page through" the data. Each time it returns, do a sum of all values and increment counter. Then once it returned the last value, take the total sum and divide by the counter. However, I do not know if this will work with the .filter properties well or the GQL. Any ideas? – Linsidious Feb 04 '10 at 20:43
0

This post is quite old, but I would like to provide a useful reference. App Engine now offers a built-in API to access datastore statistics:

For Python,

from google.appengine.ext.db import stats

global_stat = stats.GlobalStat.all().get()
print 'Total bytes stored: %d' % global_stat.bytes
print 'Total entities stored: %d' % global_stat.count

For Java,

import com.google.appengine.api.datastore.DatastoreService;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.Query;

// ...
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Entity globalStat = datastore.prepare(new Query("__Stat_Total__")).asSingleEntity();
Long totalBytes = (Long) globalStat.getProperty("bytes");
Long totalEntities = (Long) globalStat.getProperty("count");

It is also possible to filter entities number only for a particular kind. Take a look at this reference:

https://developers.google.com/appengine/docs/python/datastore/stats https://developers.google.com/appengine/docs/java/datastore/stats

Nimit Pattanasri
  • 1,602
  • 1
  • 26
  • 37
  • Another way is to keep a counter in the datastore (that you update with every addition/deletion), or sharding this counter if necessary. https://blog.svpino.com/2015/03/08/how-to-count-all-entries-of-a-given-type-in-the-app-engine-datastore – svpino Mar 08 '15 at 16:24
  • Is there any other way to get the overall entity status only using Datastore Library? Not using `app.engine` library! – Madhi Oct 01 '18 at 09:40
0

Would a solution using cursors (like the one below) work for you? I personally use this method to count the number of entries in a scenario similar to yours, and haven't seen yet any problems with it (although I run on a schedule, since constant querying of the data store is pretty taxing on the CPU quota).

def count(query): 
  i = 0 
  while True: 
    result = query.fetch(1000) 
    i = i + len(result) 
    if len(result) < 1000: 
      break 
    cursor = query.cursor() 
    query.with_cursor(cursor) 
  return i 
dpq
  • 9,028
  • 10
  • 49
  • 69
-1

This sounds very similar to a question that I posed on StackOverflow.

How to get the distinct value of one of my models in Google App Engine I needed to know how to get a distinct values for an entities within my models and there is going to be over 1000 entities for that model.

Hope that helps.

Community
  • 1
  • 1
AutomatedTester
  • 22,188
  • 7
  • 49
  • 62
  • It is a similar issue, except the end result is not outputted as distinct entities, but as a sum or avg. Thanks! – Linsidious Feb 04 '10 at 20:40