1

In order to detect divergent information I need to track modifications. The mechanism just needs to detect if divergence happend or not.

One idea is using a modification counter along with a checksum of the modification operations being compressed.

So oldChecksum + checksum(operation) => newChecksum. The idea is that it is quite unlikely to have a sequence of different operations in time that produce the same checksum at the same modification count over and over again.

Is this a good way to do this?

What is the best way to combine two checksums to get a new checksum that is still holding up to the properties of a checksum?

I even thougth about greating UUIDs to stamp each operation making those unique instead of using checksums. Is this an option?

Update:

On request here is an explaination of the actual problem. The information is a data store. A operation is simply a modification of the data. The modifications are send by a master. If a master fails a new master takes over. If the data store experience a situation where replicas have different masters (split brain for instance) and receive different operations on merge those replicas who are divergent must be scrapped and replaced. In order to detect those I need a mechanism and using combinations of the checksums of each operation is quite sufficient I guess.

The algorithm will use multiple checksums of the last n operations to decrease the unlikeliness of coincidents (thou I dislike the idea of using a concept of chance here).


PS: If someone knows a way to detect divergent replicas without just reducing the likeliness of a undetected but divergent state, I would like to know it. All concepts I know and come up with use some combination of sequence states (time, counters, hashes) and are just like reducing the chance of undetected collision. Any idea?

Martin Kersten
  • 5,127
  • 8
  • 46
  • 77
  • Most checksum algorithms are [streaming](http://stackoverflow.com/questions/2214259/combining-md5-hash-values), in that you can add new content as it comes down and only request the digest when you need it. For example, if you decide to use md5, you would do something like this: `hash = md5(); hash.add(content1); checksum1 = hash.digest(); hash.add(content2); checksum2 = hash.digest();`. Now compare checksum1 and checksum2. – Matthew King Jul 05 '15 at 22:18
  • I think this greatly depends on how checksum(operation) works. If it becomes too complicated, maybe just check-sum the entire data? If it is composite, you could use a tree of checksums and compute a checksum of the tree. – 5gon12eder Jul 05 '15 at 22:19
  • So you're trying to detect changes in the document. You've asked if the information is "divergent". Divergent from what? Could you provide a more accurate explanation of what it is you're trying to do? – christopher Jul 05 '15 at 22:21
  • Updated the question with the actual scenario. – Martin Kersten Jul 05 '15 at 22:42

0 Answers0