0

This is an alternative approach to solving this problem, not a duplicate.

I want my Azure role to reprocess data in case of sudden failures. I consider the following option.

For every block of data to process I have a database table row and I could add a column meaning "time of last ping from a processing node". So when a node grabs a data block for processing it sets "processing" state and that time to "current time" and then it's the node responsibility to update that time say every one minute. Then periodically some node will ask for "all blocks that have processing state and ping time larger than ten minutes" and consider those blocks as abandoned and somehow queue them for reprocessing.

I have one very serious concern. The above approach requires that nodes have more or less the same time. Looks like I should not make assumptions about global time.

But all nodes talk to the very same database. What if I use that database time - with functions like GETUTCDATE() in SQL requests I will do exactly the same thing as I planned, but not I seem to not care whether nodes time is in sync - they will all use the database time.

Will this approach work reliably if I use the database time functions?

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979

1 Answers1

1

As a general rule this approach should work. If there is only one source for the current time then time synchronisation issues are not going to affect you.

But you have to consider what happens if your database server goes down. Azure will switch you over to another copy, on another server, in another rack and this database server is not guaranteed to be synchronised at least with regard to time with the original one.

Also this approach will undoubtably get you in trouble if ever you have to scale out your database.

I think i still prefer the Queue based approach.

David Steele
  • 3,433
  • 21
  • 23