We have our in houses noSQL db, that basically store everything in a compact binary file. Now, I need a data structure similar to key-value store or B+Tree. The issue is 'value' in my case can be of different types, and of size very volatile, could be from 1Kb to 1-2Gb. Typically the key is a string, and the value is a stream of data, can be a stream of int, string, or of custom type.
I was thinking about implementing an B+ Tree, but that's not easy because the B+ Tree need the 'value' to be of the same type, and the size of 'value' should be small enough to be storable in a relative small block. There maybe a variant but I didn't find a tutorial of how to implement a B+ Tree with examples showing how to store on disk. Most of tutorial I see are only in-memory B+ Tree.
I then have the idea of using folder/file name as the key. And then the value can be anything inside the file. Values then could be of arbitrary size, that's really what I want. So my question is here, in the extreme case,
- data for different days is store in separated folders
- I can have 1M-50M keys (indeed files/folder) to store on disk for a days
- Data operation on a files will generally be 'read-only', and 'append to' during the day. Historical data will never be modified.
I've seen that I can have ~4 billion files on modern OS, so I'm happy with that approach for ~2YR storage on a single machine. I just worries if that way of implementing key-value store is very bad? Why? What issue can I have when dealing with filesystem? (Framented disk on windows for example?)
All are implemented in C++ in both Windows/Linux.