Imagine you have a filesystem tree:
root/AA/aadata
root/AA/aafile
root/AA/aatext
root/AB/abinput
root/AB/aboutput
root/AC/acinput
...
Totally around 10 million files. Each file is around 10kb in size. They are mostly like a key-value storage, separated by folders just in order to improve speed (FS would die if I put 5 million files in a single folder).
Now we need to:
archive this tree into a single big file (it must be relatively fast but have a good compression ratio as well - thus, 7z is too slow)
seek the result big file very quickly - so, when I need to get the content of "root/AB/aboutput", I should be able to read it very quickly.
I won't use Redis because in the future the amount of files might increase and there would be no space for them in RAM. But on the other side, I can use SSD-powered servers to the data access will be relatively fast (compared to HDD).
Also it should not be any exotic file system, such as squashfs
or similar file systems. It should work in an ordinary EXT3 or EXT4 or NTFS.
I also thought about storing the files as a simple zlib-compressed strings, remembering the file offset for each string, and then creating something like a map which will be in RAM. Each time I need a file, I will read the content offset form the map, and then - using the offsets - from the actual file. But maybe there is something easier or already done?