0

I'm tracking a linux filesystem (that could be any type) with pyinotify module for python (which is actually the linux kernel behind doing the job). Many directories/folders/files (as much as the user want to) are being tracked with my application and now i would like track the md5sum of each file and store them on a database (includes every moving, renaming, new files, etc).

I guess that a database should be the best option to store all the md5sum of each file... But what should be the best database for that? Certainly a very performatic one. I'm looking for a free one, because the application is gonna be GPL.

Pabluez
  • 2,653
  • 3
  • 19
  • 29

3 Answers3

0

The first database I'd attempt would be SQLite3. SQLite3 is easy to use, very well tested, provides a large array of interface libraries and pre-written tools to work with databases, and it is very easy to "embed" into an application. (Far easier than getting MySQL or PostgreSQL installed on a system.)

SQLite3 also seems "easier" for people to work with than Berkeley DB, which is the main alternative to SQLite3.

sarnold
  • 102,305
  • 22
  • 181
  • 238
  • let suppose that the complex installation and maintenance is not a deal breaker. What would you choose? Still with SQLite3? i'm afraid that a user choose to track a directory like mail spool, arriving tons of mails (and creating tons of files) and the tracker all the time working and adding md5checksum of each file would need a very light and fast database to not raise the loadbalance to the sky, isn't it? – Pabluez Dec 11 '11 at 02:48
0

Sounds like you want a key-value store rather than a full-blown database. You could take a look at LevelDB from Google. Given it doesn't have the features that a full-blown SQL db has, and was designed for efficiency, it's likely to be the most performatic solution. There's some performatance numbers on the linked page.

0

You could try Redis. It is most certainly fast.

But really, since you're tracking a filesystem, and disks are slow as snails in comparison to even a medium-fast database, performance shouldn't be your primary concern.

Thomas
  • 174,939
  • 50
  • 355
  • 478