Structure for large volume of semi-persistent data?

Question

I need to track a large volume of inotify messages for a set of files that, during their lifetime, will move between several specific directories, with inodes intact; I need to track the movement of these inodes, as well as create/delete and changes to a file's content. There will be many hundreds of changes per second.

Because of limited resources, I cant store it all in RAM (or disk, or a database).

Luckily, most of these files will be deleted in short order; the file content- and movement-history just need to be stored for later analysis. The files that are not deleted immediately will end up staying in a particular directory for a known period of time.

So it seems to me that I need a data structure that is partially stored in RAM, and partially saved to disk; part of the portion saved to disk will need to be recalled (the files not deleted), but most will not. I will not need to query the data, only access it by an identifier (the file name, which is [A-Z0-9]{8}). It would be helpful to be able to configure when the file data is flushed to disk.

Does such a beast exist?

Edit: I've asked a related question.

score 0 · Accepted Answer · 2011-07-01T05:33:28.927

0

Why not a database? Say SQLite.

While SQLite isn't the most efficient storage mechanism in terms of space there are a number of advantages -- the first and foremost is that is an SQL RDBMS. The amount of memory SQLite uses (to temporarily cache data) can be configured through the cache_size pragma.

If SQLite isn't an option, what about one of the "key value stores"? They range from distributed client/server in-memory (e.g. memcached) to local embedded disk-based (e.g BDB) to memory-with-a-persistent-backing-for-overflow and anywhere in-between, etc. They do not have the SQL DDL/DQL (although some might allow relationships), but are efficient at what they do -- store keys and values.

Of course, one could always implement a LRU structure (say a basic sorted list with a limit) with overflow to a simple extensible disk-based hash implementation... but... consider above first :) [There may also be some micro-KV libraries/source out there].

Happy coding.

edited Jul 01 '11 at 05:33

answered Jul 01 '11 at 05:23

Thanks! "memory-with-a-persistent-backing-for-overflow" is exactly what I am looking for. Will check out your references immediately. – mikewaters Jul 01 '11 at 17:48
Is SQLite capable of handling a large quantity of writes/sec (from the same process)? – mikewaters Jul 01 '11 at 18:20
@threecheeseopera SQLite is *very fast* in a non-contention scenario. While *commits* are limited by the speed of the HDD (say, [20-40/second](http://www.sqlite.org/faq.html#q19) but *much higher* on SSD), updates can reach well into the *tens of thousands* per second (depending, of course). Just remember to use transactions :) While very old, here is a general idea: [Speed comparisson](http://www.sqlite.org/speed.html). – Jul 01 '11 at 22:50

Structure for large volume of semi-persistent data?

1 Answers1