1

We have our in houses noSQL db, that basically store everything in a compact binary file. Now, I need a data structure similar to key-value store or B+Tree. The issue is 'value' in my case can be of different types, and of size very volatile, could be from 1Kb to 1-2Gb. Typically the key is a string, and the value is a stream of data, can be a stream of int, string, or of custom type.

I was thinking about implementing an B+ Tree, but that's not easy because the B+ Tree need the 'value' to be of the same type, and the size of 'value' should be small enough to be storable in a relative small block. There maybe a variant but I didn't find a tutorial of how to implement a B+ Tree with examples showing how to store on disk. Most of tutorial I see are only in-memory B+ Tree.

I then have the idea of using folder/file name as the key. And then the value can be anything inside the file. Values then could be of arbitrary size, that's really what I want. So my question is here, in the extreme case,

  • data for different days is store in separated folders
  • I can have 1M-50M keys (indeed files/folder) to store on disk for a days
  • Data operation on a files will generally be 'read-only', and 'append to' during the day. Historical data will never be modified.

I've seen that I can have ~4 billion files on modern OS, so I'm happy with that approach for ~2YR storage on a single machine. I just worries if that way of implementing key-value store is very bad? Why? What issue can I have when dealing with filesystem? (Framented disk on windows for example?)

All are implemented in C++ in both Windows/Linux.

ctNGUYEN
  • 241
  • 2
  • 6
  • What will be your format of Key in case if you are planning for folder/file as the key? – sameerkn May 25 '16 at 12:15
  • Keys will me normal strings and is 100% legal for folder/file naming. – ctNGUYEN May 25 '16 at 13:07
  • Disk fragmentation isn't an issue on SSD's anyway. And since you appear to not delete old data, you only need a single full-disk write which is far below SSD endurance limits. (typically 1000+ full disk writes) – MSalters May 25 '16 at 15:41

3 Answers3

0

I think, if you can secure and match your requirement, it shouldn't be bad. I have done similar thing for an embedded project and its limited set of data.

Things needs to be considered

  1. OS/Filesystem should support required length of the folder(key) and filename(how you choose)
  2. It does fragment the disk and might delay the disk access for huge directory structures. Which might affect overall system process.
  3. Application Performance might degrade, since read/write operation depends on file operation - probably you can add cache in your program if required.
  4. Not good for multi-threaded application, file locking should be taken care.
  5. Security should be taken care.
Varadhan Work
  • 473
  • 4
  • 6
  • Thanks for your confirmation. [1] In the worst case, I will create a bijective map from native key to a number, so don't have any issue with folder/file name. [2] Even if I do only create new files, never delete files? That's the point that insterest me the most. Do you have any reference about this issue? [3][4][5] Had another layer to protect the data on backend : caching, concurent read/write, security... – ctNGUYEN May 25 '16 at 13:15
  • Just thought about Hadoop HDFS, I'm not sure if you intend to use it. But it also provides simple interfaces to store data similar to local file system. You can't modify data, but I guess you can append to it. With regard to concurrency, caching, security it is better. – Varadhan Work May 31 '16 at 11:53
  • uhmm, things is we have a lot of small files, and HDFS seems limited in number of files (10M only). That could be an amount of 1 day data for us in an up-average case. – ctNGUYEN May 31 '16 at 13:35
0

Why are you worried w.r.t size of value. You can use your existing db. Value can be a string of following format "type|value_data", where "|" is a separator.

Here, value_data can be "actual value" or "path of file which contains value"

  • type = LOCAL (in this case value_data will be actual value if it can fit in db)
  • type = REMOTE (in this case value_data will be path of file)
sameerkn
  • 2,209
  • 1
  • 12
  • 13
0

"Data for different days is stored in different folders" - this isn't convenient if you want to search for a single across days.

Also, you might run into problems when the number of files per folder exceed a filesystem limit. 4 billion files on a disk is not a problem, 50M in a single folder is. But you don't need to store everything in the same folder, of course. A key can be partitioned into a folder part and a filename part.

Things do get tricky if you need to rely on the B-Tree property of finding a range of keys. This means you need an order, and cannot use a hashing function to map the key to a folder/filename pair. In that case, you have an issue. Worst case is that your keys are just "1" to "999999999" continuously, plus a random set of much larger keys. That means you can't use the last 4 digits as the filename (too many folders) or the last 8 digits (too many files).

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Great. The issue with cross-days is not huge for us, since 80% of query users do is on single day. Agreed that range query still have issue here, maybe I'll create another index file, alongside with all data files, then precalculate and store all aggregate indices in that. That's something todo. But the most important thing I want to ask here is with that much of files, what would deserve to paid more attentions, what could be potential issues, limitations .... – ctNGUYEN May 25 '16 at 17:31