1

I am considering storing changes/history-of-changes to database records as a git/hg repository, while current data still resides in database. If someone needs a history of changes, I will have to look it up in the repo.

There could be a folder for each collection, and filename would mirror "_id" (primary key). Hence, I will lookup the particular file for relevant info. Assumption is there is 1 primary key field (i.e no composites). Database I am using is mongodb, so records are in a document format anyways. I just need to store the JSON in a file.

Advantages to me are : Versioning systems are perfectly suited for storing/diffs/changes overtime. I can query specific versions, and list of changes too from base -> current.

Disadvantages: I can't think of any except it is novel. IO access could be an issue if history is highly sought after (but not in my case). I am ok with slower retrieval of historical data. It expected to be a rare event in anycase.

So my question is : Is there some obvious drawback which I am overlooking?

Thanks.

Clarification: The reason I want to do it this way is because, I expect history to grow and potentially be have to kept for an infinite amount of time (ideally). I could improve this by keeping on what has changed, but it is extra effort and not trivial.

An additional thing to consider is that the speed of mongodb comes from having indexes in memory. If it has to maintain indexes for both actual+history, I will need 2x-3x more RAM than I would otherwise need. As of now, it is a premium (may not always be), but still.

Nasir
  • 2,984
  • 30
  • 34

1 Answers1

0

Instead of exporting data into JSON (via BSON, as that is what mongodump provides), I would suggest you create "archive" collections for each collection in MongoDB. You can then move outdated versions of your documents to this archive collection in the same format that you have them in your normal collection. As long as you store a date with the documents, you can very easily retrieve earlier versions without having to use a cumbersome solution such as a GIT/HG repository. Comparing versions is something you will probably need to visualize anyway, so doing that with JSON representations of documents instead of the real documents isn't a real benefit anyway.

Derick
  • 35,169
  • 5
  • 76
  • 99
  • That is how I am doing it at the moment. I write a complete copy of old record to history, and keep new in the current collection. Given RAM is a constraint, I would rather have the RAM keep the indexes of actual collection, rather than compromise it by, keeping parts of actual and parts of history. There is little data now, but when all is said and done, I expect history will probably grow to be bigger. – Nasir Jun 23 '13 at 15:56
  • But the OS will only keep the indexes and data in RAM that are frequently used. If you hardly ever consult the older data, then those documents won't fit in RAM. You could opt for *multiple* archive collections though, f.e. one for each month or year. But that of course makes searching harder... – Derick Jun 24 '13 at 09:18