2

I have a WCF service that processes a feed of tens of thousands of records from SAP. The service call takes an XElement as its main parameter and processes the XML to update records in our database. The current intent is to have the WCF service be called asynchronously, and to have the service call send back to the caller the same document with statuses for each record processed.

I'm also looking into ways to multithread the processing of the data, though this may not end up buying me anything.

Because this could take a while, I'm concerned about what will happen if the WCF services dies, gets restarted, etc. I need to know which records I've processed, and which I haven't, and be able to complete processing on the remaining records.

The best I've been able to come up with is to update each node with a status (I have to do this, anyway, to send back to the caller), and save this file to the hard drive. But saving a file that large potentially 100,000 times doesn't really seem feasible.

What other strategies could I use to track these records as I process them?

TIA!
James

razlebe
  • 7,134
  • 6
  • 42
  • 57
James King
  • 6,233
  • 5
  • 42
  • 63
  • I'm going to mix wageoghe's answer with the MSMQ answer from jpmcclung... I'd mark them both as the answer if I could, but I think database is the more critical of the two pieces. Thanks all! – James King Dec 13 '10 at 21:52

4 Answers4

3

I see using a MSMQ as being a great way to fulfill most of the needs you outlined about. If you broke the nodes into messages and entered them in a transactional queue.

  • Scaling the processing of the data would be easier through having more machines processing on the queue one you maxed out the capabilities of one.
  • If WCF "dies, gets restarted, etc" you don't lose anything.
  • The real problem you will have with this scenario getting the client to figure out where the service is at in the processing. The queue messages are one way only. You would probably need another service call that would evaluate the status of processing the queue.

Links to MSMQ WCF how-to's:

http://msdn.microsoft.com/en-us/library/ms789048.aspx

http://code.msdn.microsoft.com/msmqpluswcf

jpmcclung
  • 109
  • 2
  • 9
  • MSMQ was the first thing I thought about when I said "guaranteed processing" : ) I knew it could be tied to WCF, but I wasn't sure how, thanks for the links! My biggest question will be what is the overhead going to cost me to create 100,000 messages and send them to the queue (adding more machines isn't going to happen), and I still have the issue that on the other side, if I process the records one at a time, I have 100,000 queries to the database, and 100,000 datasets. Have any ideas for those two issues? – James King Dec 10 '10 at 18:14
  • 1
    +1 for queueing. You could have the service add an acknowledgement record to another queue to be read by your client, in order to figure out what had been processed and what hadn't. But - if the only reason you wanted to know what's been processed and what hasn't is so that you can recover from a service failure/shutdown, then using a transactional queue takes this worry away anyhow. – razlebe Dec 10 '10 at 18:14
  • jpmcclung> You answer also may help me here: http://stackoverflow.com/questions/4411745/max-parameter-size-for-wcf-service/4411952#4411952 Feel free to post this answer or a similar answer there : ) – James King Dec 10 '10 at 18:26
  • Well you could have the service process the messages off the queue into your dataset. You would have to be sure everything was thread safe, but you really take away a lot of the fault tolerance you just built into the whole process if you do that. That is unless you decide to persist your dataset to disk somewhere which brings us back to your original problem. – jpmcclung Dec 10 '10 at 18:38
  • When I worked with MSMQ directly last (pre-WCF and even .NET days), I could work with a message but leave it on the queue till processed. If it works similarly with WCF, I would take wageoghe's idea and update a temp table for processing, after which I would acknowledge/remove the MSMQ message. This would guarantee the data is persisted. – James King Dec 10 '10 at 18:44
  • If I go with MSMQ/WCF, it sounds like I lose the ability to host in IIS, correct? – James King Dec 10 '10 at 18:45
  • 1
    I haven't hosted in IIS myself but I did see this when I was researching it: http://blogs.msdn.com/b/tomholl/archive/2008/07/12/msmq-wcf-and-iis-getting-them-to-play-nice-part-1.aspx – jpmcclung Dec 10 '10 at 18:49
1

Maybe you could put the records (from your XML) in your database first, maybe in a special "records to be processed" table. Each row might also be tagged with some way to correlate them with a specific request. Process the rows from the database. As you process each one, update the status field (corresponding to the node status that you would have updated on the XmlElement). When you are finished, you could either go back and update the XML (if you haven't crashed in the meantime) or you could generate new XML (could be problematic if you can't round trip the conversion XML->database->XML.

If the service dies, it should be relatively simple to examine the database to find the records that have not been processed and finish processing them.

Alternatively, could write the XML file to disk once, keep a table in the database that holds ONLY the "status" field (and one or more keys to allow you to find the corresponding record in the XML file again), process the records, update the database "status" table as you go. When finished, update the status fields in the XML file in one fell swoop by reading the status from the "status" table.

Again, if the service dies, it should be simple enough to examine the "status" table to see which rows have been processed and which have not.

Good luck!

wageoghe
  • 27,390
  • 13
  • 88
  • 116
  • I've given serious thoughts to doing exactly this, and at the moment, I'm leaning toward it as the best option. I still have to guarantee that I persist the xml, or let the caller know I didn't process it. Using jpmcclung's solution of MSMQ might guarantee the xml feed isn't lost, I have to look more into it. Then, once the service has added records to the database, I'd pull the message from MSMQ. – James King Dec 10 '10 at 18:21
  • What I'd have to figure out is a way to asynchronously fire off a thread to process the records in the temp table... the issue being that the thread has to continue running after the wcf call is complete. – James King Dec 10 '10 at 18:21
  • I'm going to mix this with the MSMQ answer from jpmcclung... I'd mark them both as the answer if I could but the database is the more critical of the piece for me, I think. Thanks all! – James King Dec 13 '10 at 21:49
  • Could you please answer http://stackoverflow.com/questions/9702379/queuing-in-oneway-wcf-messages-using-windows-service-and-sql-server ? – LCJ Mar 15 '12 at 13:39
1

If your source and destination databases are SQL Server, then you should forget about middle-men and go straight to the built-in queuing support in the database: Service Broker. You get a number of advantages over MSMQ:

  • High Availability. Service Broker is built into the database, so the database high availability and disaster recoverability solution you already have implemented will automatically pick up your messaging solution too. Your cluster or database mirroring solution will work out-of-the-box and the messaging will fail-over transparently with the database failover.
  • recovery consistency. Having you messages and you data in the same recovery unit (the 'database') allows for simple backup-restore. With messages stored in MSMQ and data stored in database is not possible to have a consistent backup unless you freeze processing.
  • routing. SSB allows for queues to move to new physical locations w/o interrupting the message stream. See Service Broker Routing.
  • increased capacity. MSMQ have a very small size limit (4GB per queue) which can be quickly overrun in production, with disastrous results. SSB limit is 2GB per message and the queue size limits are the database size limits.
  • significantly higher throughput due local transactions instead of distributed transaction. With MSMQ you must enroll the database and the MSMQ into a distributed transaction, bot at the end where you enqueue and at the end where you dequeue. This dramatically reduces the throughput in MSMQ case.

There are other advantages too:

The one thing you loose is the WCF service model programming. WCF makes it indeed extremely easy to write demo apps and you'll loose that.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • The source is a mix of SAP and BizRights; a nightly process runs to merge related records from these two sources and build the xml. Processing the rest of your points... – James King Dec 10 '10 at 18:41
  • If the source is *not* a SQL Server (could be SAP running on SQL) then SSB looses much of its appeal as you'd require a SQL instance to send the messages *from* (SSB has 0 interoperability). Installing a local SQL Express *could* work, but is nowhere as tempting as when the source of the data *is* SQL Server. – Remus Rusanu Dec 10 '10 at 18:46
  • BTW, even if you decide for WCF, since you need a way to reliable execute the database processing part of an incoming WCF call: http://rusanu.com/2009/08/05/asynchronous-procedure-execution/ – Remus Rusanu Dec 10 '10 at 18:50
  • Could you please answer http://stackoverflow.com/questions/9702379/queuing-in-oneway-wcf-messages-using-windows-service-and-sql-server ? – LCJ Mar 15 '12 at 13:40
0

Have you considered a messaging server, such as Microsoft Message Queuing.