I am looking for a simple versioning system for a large number of records or files (~50 million, ~100GB unpacked, ~20MB packed). The files are only a few Kilobytes each, and have unique IDs, so I don't mind whether they are stored in a flat structure (table, directory...) or not. On average, each record is changed once a month, but most changes have diffs less than a Kilobyte so it should be easy to compress versions. However, a naive database with one entry for each version would grow too quickly. I need the following operations:
- basic CRUD operations: create, read, update, delete
- quick listing of recent changes
- quick listing of recent changes of a particular record
- query for changes in a given period of time
- query for changes by a given user (each edit is associated to some user id and optionally has a commit message as comment)
- for write operations there must be a commit hook to validate and reject illformed records.
In short, I am looking for a Wiki-like software for simple records or files.
I thought about possible solutions:
Put files in a version control system. This gives me replication and many available access tools, so it is my preferred solution. But the amount of data is too large for distributed systems like git. Is anyone using Subversion for a similar task with success?
Implement my own versioning in a database or in a file system. I would pobably need to store only compressed records and diffs, would have more work and learn something. This would be my preferred solution, if it was just for fun.
Use a versioning file system. This would make setup, replication and access more difficult. Probably I would need to implement my own access API above the file system.
Use a versioning database system. Can you suggest some?
Use some other existing data store with versioning (MediaWiki?, Amazon Cloud Drive?, ...)
Obviously there are many pathes. Which pathes have been used by others with success for similar or larger amounts of data?