Reading chunks from an SQlite blob when a chunk might span two blobs

Question

I have a situation where I need to read arbitrarily-sized (but generally small) chunks of binary data from an SQlite database. The database lives on disk, and the data is stored in rows consisting of an id and a read-only blob of between 256 to 64k bytes (the length will always be a power of 2). I use the SQlite incremental I/O to read the chunks into a rewritable buffer, then take the average of the values in the chunk, and cache the result.

The problem I have is that since the chunks are of arbitrary size the blob size will only very occasionally be an integer multiple of the chunk size. This means that a chunk will span two blobs quite frequently.

What I am looking for is a simple and elegant (since 'elegance is not optional') way to handle this slightly awkward scenario. I have a read-chunk function which is fairly dumb, simply reading the chunks and computing averages. So far I have tried the following strategies:

Read only the first part of an overlapping chunk, discarding the second.
Make read-chunk aware of blob boundaries, so that it can move to the next blob where appropriate.
Use something like a ring buffer, so that overlapping chunks can just wrap around the edges.

The first option is the simplest but is unsatisfactory because it discards potentially important information. Since read-chunk is called frequently I don't want to overburden it with too much branching logic, so the second option also isn't appealing. Using a ring buffer (or something like it) seems like an elegant solution. What I envisage is a producer which reads intermediately-sized (say, 256 byte) chunks from the blob into a 1k buffer, then a consumer which calls read-chunk on the buffer, wrapping around where appropriate. Since I will always be dealing with powers of 2 the producer will always align to the edges of the buffer, and I can also avoid using mod to compute the indices for both producer and consumer.

I am using Lisp (CL), but since this seems to be a general algorithmic or data structure question I have left it language-agnostic. What I am interested in is in clarifying what options I have - is there another option other than the ones I've listed?

For what it's worth - in case anyone's interested - I implemented the last option (ring buffer), and it works nicely. I use the following to efficiently compute the indices - `(defun (a b) (logand a (1- b))` - since the divisor is always a power of 2. — ChrisM, Aug 21 '12 at 22:13
That should, of course, be `(defun mod2 (a b) (logand a (1- b)))`. — ChrisM, Aug 22 '12 at 04:50

Reading chunks from an SQlite blob when a chunk might span two blobs

0 Answers0