0

I am looking for suggestions on a very common problem on Google App Engine platform for keeping consistent counters. I have a task to load the groups of a domain and then create a task for each group to load its group members in a separate task. Now as there are thousands of groups and members there will be too many tasks. I will be creating one task to get one page of groups and within that task I will be creating multiple tasks for each group to get its members.Now, to know whether I have loaded all groups or not, I have the logic to just check the nextPageToken and then set the flag of groups loading to finished.

However as there will be separate tasks for each group to load members, I need to keep track of all whether all group member tasks have finished or not. Now here I have a problem that various tasks accessing a single count of numGroupMembersFinished, will create concurrency issues and somewhere the count will get corrupted and not return correct data.

Gaurav Sachdeva
  • 652
  • 1
  • 10
  • 23

4 Answers4

2

My answer is general because your question doesn't have any code or proposed solution since you don't say where you plan to keep that counter.

Many articles on the web cover this. Google for "sharding counters" for a semi-scalable way to count datastore entities quickly in O(1) time.

more importantly look at the memcache api. It has a function to atomically increment/decrement counters stored there. That one is guaranteed to never have concurrency issues however you would still need some way to recover and/or double-check that the memcache entry wasn't evicted, maybe by also keeping the count stored in an entity that you set asynchronously and "get by key" to always get its latest value.

this still isn't 100% bulletproof because the cache could be evicted at the same moment that you have many concurrent attempts to modify it thus your backup datastore entity could miss a "set".

You need to calculate, based on your expected concurrent usage, if those chances to miss an increment/decrement are greater than a comet hitting the earth. Hopefully you wont use it on an air traffic controller.

Peter Badida
  • 11,310
  • 10
  • 44
  • 90
Zig Mandel
  • 19,571
  • 5
  • 26
  • 36
1

you could use the MapReduce or Pipeline API:

https://github.com/GoogleCloudPlatform/appengine-mapreduce https://github.com/GoogleCloudPlatform/appengine-pipelines

allowing you to split your problem into smaller manageable parts whereby the library can handle all of the details of signaling/blocking between tasks, gathering the results, and handing them back to you when it's done

Google I/O 2010 - Data pipelines with Google App Engine:

https://www.youtube.com/watch?v=zSDC_TU7rtc

Google I/O 2011: Large-scale Data Analysis Using the App Engine Pipeline API:

https://www.youtube.com/watch?v=Rsfy_TYA2ZY

Google I/O 2011: App Engine MapReduce:

https://www.youtube.com/watch?v=EIxelKcyCC0

Google I/O 2012 - Building Data Pipelines at Google Scale:

https://www.youtube.com/watch?v=lqQ6VFd3Tnw

Nicholas Franceschina
  • 6,009
  • 6
  • 36
  • 51
0

Zig Mandel mentioned it, here's the link to Google's own recipe for implementing a counter:

https://cloud.google.com/appengine/articles/sharding_counters

I copy-pasted (renamed some variables, etc...) the configurable sharded counter into my app and it's working great!

Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • I realize this is pretty close to a "link only answer", but I don't see any points that can be distilled down, and even if I tried to describe the "pattern", I don't think it would help much because in my experience: implementation in GAE is not intuitive. – Zach Young May 07 '15 at 18:34
0

I used this tutorial: https://cloud.google.com/appengine/articles/sharding_counters together with hashid library and created this golang library:

https://github.com/janekolszak/go-gae-uid

gen := gaeuid.NewGenerator("Kind", "HASH'S SALT", 11 /*id length*/)
c := appengine.NewContext(r)
id, err = gen.NewID(c)

The same approach should be easy for other languages.

Janek Olszak
  • 4,067
  • 1
  • 28
  • 22