16

Background:

We have a large (15+ million) table of items which is updated frequently throughout the day (average 300K daily changes). Pending changes are stored in a staging table, and a job runs throughout the day to read changes from this table, make the updates, and mark the changes as processed.

Many other applications use the data in the items table to perform various tasks. Often, these tasks are scheduled and intensive, as they involve comparing live data with old snapshots, and updating other systems accordingly. For example, we list items on eBay, and compare live item data with our existing listings to see whether we need to insert any new eBay listings, remove items we've sold, update quantities, etc. Because the data is so large, most of these applications run infrequently, leaving things out of date much of the time.

My Question:

We are considering implementing a publisher/subscriber pattern using the Service Broker. The goal would be to publish a message when an item changes, which various other systems (like our eBay application) can subscribe to. This would enable us to make more granular updates closer to real-time, rather than large and infrequent updates that involve querying all data, not just what has changed. However, after using the Google it doesn't seem like this is a common database pattern, and that raises red flags. Is this not a valid use of the Service Broker (though I found a small section in Pro Sql Server 2008 Service Broker on doing Pub/Sub)? How is this problem normally solved? It seems like a common enough problem.

TL;DR:

Goal: Update various systems in a dynamic, loosely coupled way when single items change.

Question: Is a pub/sub style solution implemented with Service Broker a viable solution in a high volume setting?

Bort
  • 7,398
  • 3
  • 33
  • 48
  • Two questions: why do those applications use data in main items table instead of processing pending changes in staging table? Second: do you absolutely need SQL Server to hold and publish the items? – Kuba Wyrostek Jun 29 '12 at 20:49
  • 1
    Take a look at built-in SQL replication, it is flexible enough to accomodate a broad range of scenarios. – alexm Jun 29 '12 at 20:55
  • @KubaWyrostek that was our first thought, and what led us to consider the message-based approach. Rather than read from that table every so often, why not use it as a source of 'Item Changed' messages. Maybe an unnecessary layer of abstraction, but as the staging table gets periodically wiped of processed updates, thought safest and most flexable would be Service Broker – Bort Jun 29 '12 at 20:56
  • @KubaWyrostek as for moving the items out of SQL, what alternatives do you suggest? – Bort Jun 29 '12 at 20:58
  • But if I guess correctly the real source of "Item Changed" is actually a process that fills up the staging table and SQL Server is used merely as a storage (apart from reporting, transaction isolation etc.). Is that so? – Kuba Wyrostek Jun 29 '12 at 20:59
  • Do these applications need to write back to items table or do these only consume information? – Kuba Wyrostek Jun 29 '12 at 21:07
  • @KubaWyrostek the entries of the staging table are the end result of a bunch of in SQL business logic, based on updates to a number of tables. The applications don't need to update, only consume. – Bort Jun 30 '12 at 20:42

3 Answers3

3

This is a very common integration pattern.

I am not familiar with SQL Server Service Broker but if you look at any integration middleware stack - TIBCO, Oracle, webMethods, Mule, Informatica etc - they all offer a "database adapter" which performs the task you describe.

The general pattern is that updates in the database trigger a message publication. This is done either via a trigger in the database table or via the adapter "polling" the table for new updates. Either method has pros and cons.

The primary benefit of this pattern is (as you suspect) more frequent and timely updates to other systems - a more "real time" way of doing business. In addition, if you perform transformation to a canonical message format then you get looser coupling between systems and therefore less pain when systms need to be changed/updated.

scaganoff
  • 1,830
  • 1
  • 16
  • 19
0

It's not a common database pattern because most relational databases are woefully ill-equipped to do it out-of-the-box. SQL Server would be also if it didn't have Service Broker.

One of the developers of Service Broker talks about how Service Broker handles the challenges that most companies turn to NoSQL solutions for.

It's my understanding that Service Broker is meant to do exactly what you're looking for. It can even use External Activation to run custom applications when the data has changed.

Marcos Dimitrio
  • 6,651
  • 5
  • 38
  • 62
dpw
  • 1,576
  • 9
  • 14
  • Intuitively speaking: Service Broker would give a much overhead for this relatively simple problem in comparison to simple query over a timestamped, clustered index of changes. Even one-way replication (as alexm suggested) seems a better solution. The more universal the tool, the bigger performance penalty IMHO. – Kuba Wyrostek Jun 30 '12 at 12:57
0

Usually these kinds of problems end up getting larger (in scope) than what a database focused pub/sub implementation can provide. If I can make a recommendation, consider using a full-scale service bus:

nServiceBus (Free up to a certain size, pretty cheap past that)

You get a solid pub/sub framework, but easy to work with and there is a large user base of customers, articles, and samples.

JasonRShaver
  • 4,344
  • 3
  • 32
  • 39