I'm building an application in which I need to compute a cryptographic hash on the database. I need it to be deterministic. That is, I will have different instances of the database, which should be identical, and which should result in identical hashes. Conversely, if they are not identical, then the hashes should not match.
I'm considering storing my data in lokijs, but if I do I'm not sure how I can compute the hash without extracting ALL the data and running my hash algorithm on it, which would be prohibitive. Wouldn't it be cool if there was a "get hash" function that would return a hash on the database content? Don't suppose such a thing exists, does it?
If this doesn't exist (and I haven't seen any mention of it in the documentation, so it probably doesn't) then I was thinking of adding a "shim" in front of the database, where I would maintain the hashes of all the objects in the database. So for example:
function add (myObject, myCollection, myHashCollection) {
hash = myObject.computeHash();
myCollection.insert(myObject);
myHashCollection.insert({key: myObject.someUniqueValue, value: hash});
}
and so on, similar functions for update and delete, which, given an object, would update it or delete it in myCollection, and also update or delete the corresponding hash value in myHashCollection.
Then I could write a getHash() function which would munge together the hashes in myHashCollection and return a root hash.
If I had this, I could very quickly compare the root hashes from different instances of my application to know immediately if they have identical data.
Also, I could ensure no errors in persist/restore. I would persist the object data and persist just the root hash. Upon restoring my objects from persistent store, I would iterate through ALL the data and recompute the hashes, munge them together, and I should get the same value as the stored root hash. If not, I know something went wrong. Similar logic could be used to validate that the in-memory stored data has not been corrupted.
I could also write a merkle tree algorithm that would very efficiently allow any particular object retrieved from the database to be checked against the root hash.
Am I crazy or does this sound reasonable?