How to efficiently check if a result set changed and serve it to a web application for syndication

Question

Here is the scenario:

I am handling a SQL Server database with a stored procedure which takes care of returning headers for Web feed items (RSS/Atom) I am serving as feeds through a web application.

This stored procedure should, when called by the service broker task running at a given interval, verify if there has been a significant change in the underlying data - in that case, it will trigger a resource intensive activity of formatting the feed item header through a call to the web application which will get/retrieve the data, format them and return to the SQL database.

There the header would be stored ready for a request for RSS feed update from the client.

Now, trying to design this to be as efficient as possible, I still have a couple of turning point I'd like to get your suggestions about.

My tentative approach at the stored procedure would be:

get together the data in a in-memory table,
create a subquery with the signature columns which change with the information,
convert them to XML with a FOR XML AUTO
hash the result with MD5 (with HASHBYTES or fn_repl_hash_binary depending on the size of the result)
verify if the hash matches with the one stored in the table where I am storing the HTML waiting for the feed requests.
if Hash matches do nothing otherwise proceed for the updates.

The first doubt is the best way to check if the base data have changed.

Converting to XML inflates significantly the data -which slows hashing-, and potentially I am not using the result apart from hashing: is there any better way to perform the check or to pack all the data together for hashing (something csv-like)?

The query is merging and aggregating data from multiple tables, so would not rely on table timestamps as their change is not necessarily related to a change in the result set

The second point is: what is the best way to serve the data to the webapp for reformatting? - I might push the data through a CLR function to the web application to get data formatted (but this is synchronous and for multiple feed item would create unsustainable delay)

or

I might instead save the result set instead and trigger multiple asynchronous calls through the service broker. The web app might retrieve the data stored in some way instead of running again the expensive query which got them.

Since I have different formats depending on the feed item category, I cannot use the same table format - so storing to a table is going to be hard.

I might serialize to XML instead.

But is this going to provide any significant gain compared to re-running the query?

just to avoid being obscure: by _significant change_ I meant **any** change which is affecting the columns, in the dataset used to generate the header, which bring different information to the user (for instance, a 'lastDataCheckTime' info will obviously not be considered a significant change) — eddo, May 06 '12 at 16:14

score 0 · Answer 1 · answered May 06 '12 at 15:56

0

For the efficient caching bit, have a look at query notifications. The tricky bit in implementing this in your case is you've stated "significant change" whereas query notifications will trigger on any change. But the basic idea is that your application subscribes to a query. When the results of that query change, a message is sent to the application and it does whatever it is programmed to do (typically refreshing cached data).

As for serving the data to your app, there's a saying in the business: "don't go borrowing trouble". Which is to say if the default method of serving data (i.e. a result set w/o fancy formatting) isn't causing you a problem, don't change it. Change it only if and when it's causing you a significant enough headache that your time is best spent there.

answered May 06 '12 at 15:56

Ben Thul

31,080
4
45
68

Ben, If i got it properly, query notification in this scenario would help to notify the Web Application that the underlying data have changed and the feed header need rebuilding. But still SQL Server has to check whether the result set has changed and post the message on the broker queue for the notification. This is not helping me much - I can notify the Web App about the same triggering a MVC Action using a HttpWebRequest managed by an existing CLR dll - the point is efficiently defining whether something has changed so that a notification has to be sent – eddo May 06 '12 at 16:07
I think that it's a little more efficient than you're letting on. The other side of the coin says: how timely do your updates need to be? I subscribe to many RSS feeds that update just a couple of times a day even though there are many articles posted between RSS updates. And that's okay. – Ben Thul May 06 '12 at 22:21
mh, yes, I had a look [here](http://msdn.microsoft.com/en-US/library/3ht3391b(v=vs.80).aspx) and that sounds interesting. I will keep in mind for other parts of the web development (debugging brokers is tricky and I am not sure how this might work for this specific problem with a Web App, which can decide to go to sleep on its own-). – eddo May 07 '12 at 04:27

How to efficiently check if a result set changed and serve it to a web application for syndication

1 Answers1