0

I need to store short strings (50+ characters), and quickly look them up. At first I wanted to use SQLite for this. I created a single table with a single indexed TEXT column. Using all kinds of low-level tricks I could fill the database with 1 million strings in 10 seconds.

The problem was that if the PC was rebooted, adding an additional 10.000 rows took 30 seconds, which is nowhere in line with 1M rows in 10s. The reason for this is that SQLite has to read a very large part of the existing index from disk, before it can add the new rows. Another problem is that the database doubles in size, because all strings are stored twice (once in the regular table, once in the index table) because of the b-tree mechanism.

Is there a simple alternative? Which uses hash-based lookups for example? I know about the various NoSQL solutions, but they are focussed at decentral and distributed storage, while I just need a simple embedded 'flat-file' database?

Maestro
  • 9,046
  • 15
  • 83
  • 116
  • I don't get it regarding the index? Are you using the same value as key and value? – jgauffin Nov 09 '11 at 13:56
  • You need to specify witch language/platform f you want an embedded database. – jgauffin Nov 09 '11 at 13:57
  • An index is totally useless if you are using the same value as both index and value. – jgauffin Nov 09 '11 at 14:09
  • possible duplicate of [Embedded non-relational (nosql) data store](http://stackoverflow.com/questions/2081080/embedded-non-relational-nosql-data-store) – jgauffin Nov 09 '11 at 14:10
  • possible duplicate of http://stackoverflow.com/questions/2374496/cross-platform-embedded-database-key-value-store-for-c-sharp – jgauffin Nov 09 '11 at 14:10
  • @jgauffin Care to explain? The index allows me to quickly find the rows starting with 'abc', if I didn't have the index, it would require a full-table scan, which would be so much slower? – Maestro Nov 09 '11 at 14:10
  • 1
    How would a full table scan be slower if the index contains exactly the same as the actual table? You also need take into account that everything is duplicated in both the table and the index. – jgauffin Nov 09 '11 at 14:12
  • 1
    @jgauffin Because the data in the index is sorted, it can do a quick binary search, and skip most of the rows. Using a full-table search, **every** row has to be inspected. – Maestro Nov 09 '11 at 14:13
  • Hi @Joshua, did you manage to solve your problem? – Steven de Salas Nov 10 '11 at 17:47
  • @Steven Sort off.. I'm currently experimenting with SQLite FTS tables, they have multiple b-tree branches which are merged when needed, so additional inserts take almost no time. – Maestro Nov 10 '11 at 18:20

1 Answers1

0

How critical is it to persist the latest values to disk?

You can use an in-memory collection such as Dictionary or List and persist it to disk by serializing it to a local file in an ansynchronous call at regular intervals.

Then when your application starts you can deserialize it back into memory.

This would be a far more responsive way to accomplish what you are after as you dont have to keep hammering the disk all the time with read/write efforts.

Steven de Salas
  • 20,944
  • 9
  • 74
  • 82
  • The data is already in a Dictionary before I insert it to SQLite, so it would be fairly easy to implement. But deserializing it back into memory, would still require reading the whole file (correct me if i'm wrong) from disk. It may be faster than SQLite because it will be one long sequential read instead of many small random reads. But it's still not what I'm looking for: being able to perform inserts without reading all previous records. – Maestro Nov 09 '11 at 14:32
  • How about appending to a plain text file? – Steven de Salas Nov 09 '11 at 14:34
  • It would reverse the problem: Using a text-file I can do inserts without reading the previous records, but to look up a single key, I have to read 50% of the file on average, because I cannot translate the hash/key to a position within that file. I could create a second file, where I map the hash-values to file offsets, but essentialy I'm re-inventing the wheel then :) – Maestro Nov 09 '11 at 14:43
  • Hmm.. you could use the binary serialization I mentioned in the answer, except that at startup you commence your app with a new Dictionary until the serialized data is loaded to memory, then you merge the two dictionaries by copying your new hash-values into the one loaded from disk. Simpler solution, but will only work if you dont need to base your application logic on wether a particular result has already been added. – Steven de Salas Nov 09 '11 at 15:17