Doing DB Queries Verus Storing Items in A collection?

Question

I am trying to make to make a reminder system and I am using quartz for my scheduling. However I come up with a couple possible ways how to do what I need to do but I am not sure what the best way is and how to test it.

Basically I have a reminder system that users can set reminders. It is like Google Calendar. You set the date and time that your event is and then you set a reminder by saying "remind me 15 minutes before"

So you could have a event on May 10th, 2011 9:59am and you could say reminded me "15 minutes before"

So that would be May 10th, 10:44am.

I will be in a hosted environment. (My site and the scheduling will be running of in the same environment and even in the same solution. So it can't slow down the users browsing my site by much.)

I am also using nhibernate and fluent nhibernate to do the db querying. I am using asp.net mvc 3 for my web site.

Option 1.

Do a database query every minutes and get all reminders that should be sent out in that minute. This of course will mean a database query every minute and probably too intensive for a shared environment.

Option 2.

Do a database query every 5 minutes and grab all the reminders that should be sent in that 5 minute block and store them in a collection(so memory) and then check every minute which ones need to be sent out.

This of course lessens the amount of queries done but not sure if this will get extremely memory intensive.

Option 3

Same as Option 2 but send a query every 15 minutes and store in a collection.

This of course means alot less databases queries but more stored in memory.

Option 4

Do a database query every 15 minutes and get all reminders in that block and fire them out immediately.

This means they won't be stored in memory very long and reduced amount of queries. However depending on when the user set to be reminded the email could arrive alot earlier then they set.

For instance they said remind me at 10:44am. I would have my scheduler start at 10:00am and it would grab from 10:00am to 10:15am and then 10:15am to 10:30am then 10:30am to 10:45am.

So that email would actually arrive 14 mins earlier then intended.

Hogan · Accepted Answer · 2011-05-14T19:02:55.477

1

Here is how I would solve this problem.

At the DB Tier I would create a simple queue. This list of messages would also include a send time. When queried this list would have the next item at the top.
The message agent would query this list and act on the top item or sleep till the top item on the list comes due.

One of the advantages of this technique is that you don't have the acting agent applying business rules for when it checks the queue. If you want it to wake up every minute (for example to check if there are new messages which need to be sent out) then you just make sure this queue always has an event every minute (this event could have a type that does not send a message, a "wake up" message has no targets). The agent will wake up and perform the check. Then if you want to apply more complicated scheduling rules they are easy. You don't have to recode the agent you just have to change what messages are put in the queue. (For example check every 10 mins when the system is in high use and every 20 mins when it is low use and stop checking during nightly backup). This can all be done (and changed) without changing the code on your agent.

A simple real world example

QueueTable
----------
ID int
deliverTime datetime
nagCount int
expireTime datetime
active bool
processed datetime (null)
' maybe some audit stuf...
' content of the message -- or external link
' etc

START: The agent makes a call like this

SELECT TOP 1 * 
FROM QueueTable
WHERE active = true and processed is null
ORDER BY deliverTime DESC

The agent then looks to see what the deliverTime time is:

If it has passed or in the next fuzzy boundry (1 sec?) it sends the message then sets processed to the curent time in the db and loops back to START:
If it is in the future it sleeps till that deliverTime or sets an event to wake it up at that time (depends on platform).

I originally had processed as a boolean but if you use the null to equal not processed then it can double as an audit field.

Example to check every 10 mins no matter what.

How this works: Because the results are sorted by time the soonest one will show up at the top. What we do is add in an item 10 mins from now into the result set. Thus the top item will never be more than 10 mins from the current time.

SELECT TOP 1 * 
FROM QueueTable
WHERE active = true and processed is null
UNION ALL
SELECT NULL, DATEADD(min,GETDATE(),10), null, null, false, null, ...
ORDER BY deliverTime DESC

Note, active column is being used as a flag here to show no action will be performed. This record is just a marker to wake up the agent. This method cal also adjust this depending on other rules (eg time of day because at night you don't need to check as often etc.)

edited May 14 '11 at 19:02

answered May 10 '11 at 17:09

Hogan

69,564
10
76
117

@Hogan - Could you write a simple example. I am having a bit of trouble following. I already have a send time calculated and stored in the database so that was how I was planning to get my queries by doing a Where clause to check if it falls into that block. I not sure what you mean by your second point. I am not sure what you mean by message agent and how it would act on the top item. – chobo2 May 10 '11 at 17:25
@Hogon - is agent something in ms sql server or do I do this from my C# code/nhibernate? – chobo2 May 10 '11 at 20:46
@chobo2 -- the agent is whatever performs the notifications. In your case I believe it is C# (But the solution is not tied to one architecture -- that is the agent could be anything that performs the notifications). – Hogan May 10 '11 at 21:11
@Hogan - Hmm stil having a bit of a hard time to follow. So your taking the top query that meets you condition(why do you know check if it is in that deliverTime range?) then you send it out. If the top message is in the future you then make it sleep till it is time range and then send it out and then repeat. Is that the process? – chobo2 May 10 '11 at 21:27
No. I take the top item. If it is in the future I exit this process and sleep till that message is active otherwise I send it out. – Hogan May 10 '11 at 21:38
If I've sent one out I repeat – Hogan May 10 '11 at 21:38
@Hogan - Ah I see your ordering the deliverTime and then taking the top one. How long would you sleep for? How do you figure that out look at current time and returned result time and sleep for that long? How about if you have say 3 rows that happen within the same minute how do you make sure that they all get fired before it moves to the next time? With your way it might seem that you might do a 100 queries in a minute(say if 100 tasks fall in that minute) then maybe for the next hour nothing. So I guess it comes back to should maybe I query for a X block(maybe 15mins or a hour) and then do – chobo2 May 10 '11 at 21:53
what your doing since how I see it is that your way will save alot of queries when nothing needs to be sent but if there is some peak time or something and you end up having 200 messages that follow in that minute or something you might be doing 200 queries at that time what could slow it down. Correct me wrong if I misunderstood something. – chobo2 May 10 '11 at 21:54
@chobo2 - yes if you expect to have a lot of event firing in the same "block" you could select more than just the top one. It would require the agent's business rules to be more complex. – Hogan May 11 '11 at 02:03
@Hogan - hmm well not sure what the best route is here. Like I can guess maybe what the peak times are but I have no clue how home many to take each time.I could do take a 100 but in reality that time frame has 500 or something like that. – chobo2 May 11 '11 at 05:47
@chobo2 - The best route is the same as it always is -- implement the easiest solution which is designed so it can be improved and then only optimize to complex solution if needed. Implement the 1 at a time and this can be upgraded if it is to slow -- which I doubt it will be. – Hogan May 11 '11 at 10:42
@ Hogan - I was just thinking of something. What happens if I first have one user and he sets up a reminder that is due on May 20th. So my code comes along and grabs that reminder and see that it has to wait to send out. Well what happen if that user(or another user) makes another reminder that should be sent on May 17th. That will never get sent till the May 20th one fires right? So how do you get around that? – chobo2 May 14 '11 at 18:46
@Hogan - I am not very good at sql queryies so that will do a check every 10mins? So is your solution to check X minutes when you see that you have a future task to see if a newer task has come up? If so get rid of this long future task and replace it with this new task(that might be closer to a send of date)? – chobo2 May 14 '11 at 18:58
@Hogan - So then you grab this place holder one and readd it into the mix? I am not sure this will work for me since I am not too keen on having some marker row floating around. I think I am going to try what I thought you ment and figure out how long the top of the list is away from being sent and keep querying the database to see if something new came up since to me that it almost the same thing as this marker row. – chobo2 May 14 '11 at 19:12
@chobo2 -- I am not adding a row to anything. The select statement creates a "phantom" row in the results to get this effect. Think of this like a string constant that is appended only after the results are read from the database. Please run that query against your database so you can see how it works. You can only learn by doing. – Hogan May 14 '11 at 21:10
@Hogan - I will try it out. But in the end your querying the database every 10minutes basically if nothing is found within under 10mins. – chobo2 May 15 '11 at 00:51
@chobo2 - only way to know is to check. – Hogan May 15 '11 at 01:20

score 0 · Answer 2 · answered May 10 '11 at 16:45

I would probably throw out Option 4 right away if it doesn't fit your requirements.

The other options really rely on your system profile (How many people are using it? How many reminders are there in any given 5/15 minute period?) Those are questions which you'll need to answer. Also, how much activity is already occurring on the server? If it's not already under high stress then a query every minute is not much at all.

Finally, keep in mind that if you only query every 5/15 minutes then you may miss a change/addition/deletion to the schedule if it happens after you've queried and the reminder is supposed to fall in that 5/15 minute window. Again, that comes down to the application requirements as to whether or not that is acceptable.

Ya Option 4 I probably could live with but I rather have it come within a few minutes of when they set it not like 14mins earlier. My goal is to stress test it with at least 1000 and then maybe 10,000 users and see how my system does. I am a far away from any of those numbers but that is what I want my thing to be able to handle at least. All these numbers are up in the air of how many reminders will be sent because it is all user defined. I see what you mean if they change the reminder time then it would still fire at the old time. That would not be the end of the world if it did. — chobo2, May 10 '11 at 21:58

Doing DB Queries Verus Storing Items in A collection?

2 Answers2