0

Suppose, In my app, I ask users to input some string. A user can input string multiple times. Whenever any user inputs a string, I log it in the database along with the day. Many strings can be same, even though inputted by different users. In the home page, I need to give the interface such that any user can query for top n (say 50) strings in any time period (say between last 45 days, or 10 Jan 2012 to 30 Jan 2012). If it was SQL, I could have written query like:

select string, count(*)
from userStrings where day >= d1 and day <= d2
group by string 
order by count(*) desc
limit n

For each user query, I can't process the record at query time - there can be millions of records. If the time period constraint was not there, I could have done something like this - Create a class for UserString and maintain unique object of it for each distinct user string, retrieve corresponding object for user inputted string, and increment it's count [Even with approach, I assume the datastore will have to process all UserStrings objects (~ 100000) and return me the top n - so it can itself be very heavy query].

I am using JDO. My obvious goal is to minimize the app engine cost : CPU + data.

Thanks,

Anil Padia
  • 513
  • 1
  • 6
  • 12

1 Answers1

0

You can use the App Engine Task Queue to process the strings offline. If you need real-time answers you could use memcache to store a temporary record of how many times that word has been entered during the day and background the processing.

  • Hi Sebastian. The question is more about how should I model my data to efficiently answer the queries. In that sense, it is more of algorithmic question under the constraints of Google App Engine. I can also use cron jobs to process the data. Real-time is not an absolute requirement. – Anil Padia Jun 27 '12 at 05:53