How to do Lazy Map deserialization in Haskell

Question

Similar to this question by @Gabriel Gonzalez: How to do fast data deserialization in Haskell

I have a big Map full of Integers and Text that I serialized using Cerial. The file is about 10M.

Every time I run my program I deserialize the whole thing just so I can lookup an handful of the items. Deserialization takes about 500ms which isn't a big deal but I alway seem to like profiling on Friday.

It seems wasteful to always deserialize 100k to 1M items when I only ever need a few of them.

I tried decodeLazy and also changing the map to a Data.Map.Lazy (not really understanding how a Map can be Lazy, but ok, it's there) and this has no effect on the time except maybe it's a little slower.

I'm wondering if there's something that can be a bit smarter, only loading and decoding what's necessary. Of course a database like sqlite can be very large but it only loads what it needs to complete a query. I'd like to find something like that but without having to create a database schema.

Update

You know what would be great? Some fusion of Mongo with Sqlite. Like you could have a JSON document database using flat-file storage ... and of course someone has done it https://github.com/hamiltop/MongoLiteDB ... in Ruby :(

Thought mmap might help. Tried mmap library and segfaulted GHCI for the first time ever. No idea how can even report that bug.

Tried bytestring-mmap library and that works but no performance improvement. Just replacing this:

ser <- BL.readFile cacheFile

With this:

ser <- unsafeMMapFile cacheFile

Update 2

keyvaluehash may be just the ticket. Performance seems really good. But the API is strange and documentation is missing so it will take some experimenting.

Update 3: I'm an idiot

Clearly what I want here is not lazier deserialization of a Map. I want a key-value database and there's several options available like dvm, tokyo-cabinet and this levelDB thing I've never seen before.

Keyvaluehash looks to be a native-Haskell key-value database which I like but I still don't know about the quality. For example, you can't ask the database for a list of all keys or all values (the only real operations are readKey, writeKey and deleteKey) so if you need that then have to store it somewhere else. Another drawback is that you have to tell it a size when you create the database. I used a size of 20M so I'd have plenty of room but the actual database it created occupies 266M. No idea why since there isn't a line of documentation.

Your question is different (and more interesting!) than Gabriel's. He wanted raw speed deserializing an entire dataset; you want to decode (and perhaps read) only what's needed. I don't think you'll find a drop-in solution; assuming you want to look up items in your `Map`, won't you have to deserialize at least all the keys, absent a particularly clever random-access deserializing scheme? — Christian Conkle, Oct 25 '14 at 00:38
I've never tried anything like this but: use a custom serialization structure which records the keys and their offsets in the stream. Then deserialize the keys with thunks to their value deserializers. If you don't want to deserialize all keys, you can't use a `Data.Map`, and key lookup will be more involved with a binary search on the stream or something. You could use a (memoized) `Key -> Value` function though, if that's all you need. — luqui, Oct 25 '14 at 00:41
You could use something like [`persistent`](https://hackage.haskell.org/package/persistent), which is well-supported. Forget about `keyvaluehash` if you haven't already. It was a toy and is now a dead toy. — dfeuer, Feb 22 '16 at 22:28

score 1 · Answer 1 · answered Feb 22 '16 at 19:04

1

One way I've done this in the past is to just make a directory where each file is named by a serialized key. One can use unsafeinterleaveIO to "thunk" the deserialized contents of each read file, so that values are only forced on read...

answered Feb 22 '16 at 19:04

sclv

38,665
7
99
204

How to do Lazy Map deserialization in Haskell

Update

Update 2

Update 3: I'm an idiot

1 Answers1