0

I'm creating an app on GAE with Java, and looking for advice on how to handle scheduling user notifications (which will be email, text, push, whatever). There are a couple ways notifications will generated: when a producer creates content, and on a consumer's schedule. The later is the tricky part, because a consumer can change its schedule at any time. Here are the options I have considered and my concerns so far:

  1. Keep an entry in the datastore for each consumer, indexed by the time until the next notification. My concern is over the lag for an eventually-consistent index. The longest lag I've seen reported is about 4 hours, which would be unacceptable for this use-case. A user should not delay their schedule by a week, then 4 hours later receive a notification from the old schedule.
  2. The same as above, but with each entry sharing a common parent so that I can use an ancestor query to eliminate its eventual-ness. My concern is that there could be enough consumers to cause a problem with contention. In my wildest dreams I could foresee something like 10,000 schedule changes per minute at peak usage.
  3. Schedule a task for each consumer. When changing the schedule, it could delete the old task and create a new one at the new time. My concern has to do with the interaction of tasks and datastore transactions, since the schedule will be stored in the datastore. The documentation notes that enqueing a task plays nicely with transactions, but what about deleting one? I would not want a task to be deleted only to have the add fail as part of its transaction.

Edit: I experimented with deleting tasks (for option 3), and unfortunately a delete that is part of a failed transaction still succeeds. That is a disappointing asymmetry. I will probably end up going that route anyway, but adding some extra logic and datastore flags to ensure rogue tasks that didn't get deleted properly simply do nothing when they execute.

Eric Simonton
  • 5,702
  • 2
  • 37
  • 54
  • 10,000 schedule changes per minute imply 1 billion users, unless your users have nothing else to do but change their schedule every day, in which case you still need 14.4 million users. I would not worry about this until you get your first 100,000 users :) – Andrei Volgin May 24 '14 at 20:48
  • @Andrei Here was my logic (note my "wildest dreams" disclaimer): 10 million users, each changing a consumer's schedule weekly. That's about 1,000 changes/hour average. Add an order of magnitude to get peak usage of 10,000 changes/hour. I don't anticipate the app becoming that popular, but as long as I'm in the design phase I might as well consider the best/worst case. If it turns out to be too difficult to design for that case, then I'll consider a "worry about it later" approach. – Eric Simonton May 24 '14 at 21:16
  • Agreed. You are right that you will not be able to handle 10,000 writes per minute on an entity group. There are, however, other reasons not to use child-parent entities. – Andrei Volgin May 24 '14 at 21:49
  • For item 2. The common parent would be the user. This means queries would be consistent at the level of user. So I doubt you would have write speed problems. You could use named tasks for each user. The name would be unique for a user/time interval. I am not sure you could 100% garuntee that task rescheduling will always work (what if you had to flush a task queue, on app update ?) So you could then periodically run cron jobs which look for users who haven't had a task run in the allotted time, and create a new task with the correct name for the interval. This will avoid duplicate tasks. – Tim Hoffman May 25 '14 at 01:03
  • With your number projected of users, I wonder if you have considered what the cost of having scheduled and run that number of tasks ? – Tim Hoffman May 25 '14 at 01:04
  • @Tim I'm not sure how users-as-parents would help. I was considering pairing options (1) or (2) with a cron job, or something more sophisticated, and either way running a query to determine all consumers who have notifications due. If they have different parents, it won't eliminate the consistency delay. How are you envisioning it? Also, if I'm reading the [new rates](https://developers.google.com/appengine/pricing#cost_resource) correctly, there is no extra charge for having many tasks queued (except maybe the storage space to store their states?). – Eric Simonton May 25 '14 at 05:00
  • The only time you are likely to have issues with consistency will be when changes are made (add/update/delete) that affect the next notification period. Older scheduled notifications unlikely to be affected. If you are scheduling tasks for each consumer, then by having each scheduled entity run by a consumer specific task, and it's queries use the owner as the parent then those queries will always be consistent. The task fires, fetches the schedule and checks, then re-schedules. In the meantime a new schedule is added, prospective search or the action of adding the schedule queus a new task. – Tim Hoffman May 25 '14 at 05:16

1 Answers1

1

Eventual consistency in the Datastore typically measures in seconds. As Google states:

the time delay is typically small, but may be longer (even minutes or more in exceptional circumstances).

  1. Save a time of next notification for each user. Run a cron job periodically (e.g. once per hour), and send notifications to all users who have to be notified at this time (i.e. now >= next notification).

  2. Create a task for each user when a user's schedule is created with the countdown value. When a task executes, it creates the next task for this user.

The first approach is probably more efficient, especially if you choose a large enough window for your cron job.

As for transactions, I don't see why you need them. You can design your system that in the very rare fail situation a user will receive two notifications instead of one (old schedule and new schedule). This is not such a bad thing that you need to design around it.

Andrei Volgin
  • 40,755
  • 6
  • 49
  • 58
  • Thank you for the suggestions. If it turns out that task deletes are not transaction-able, I will probably rely on "minutes at the most" for the eventual consistency lag, as you suggest in (1). Concerning (2), that contradicts [the documentation](https://developers.google.com/appengine/docs/java/taskqueue/#Java_Task_names): `Once a task with name N is written, any subsequent attempts to insert a task named N fail.` – Eric Simonton May 24 '14 at 21:08
  • You are right about tasks - I updated my answer and added a thought on transactions. – Andrei Volgin May 24 '14 at 21:54
  • Thanks for bouncing ideas around with me. I did a little experimenting and came up with a solution that is basically my number (3) and your number (2), with an extra failsafe as discussed in the edit to my original post. – Eric Simonton Jun 04 '14 at 19:47