Is there a storage and performance gain to denormalize sqlite3 blob column into primary table, but treat it as a foreign key only in some cases? I have two implementations, and both seem to run slower. Is there some sqlite3 internals that preclude such usage?
I have a ~100GB sqlite file with two tables. The first maps z,x,y coordinates to an ID -- a 32 chars hex string stored as TEXT
. The second table maps that ID to a BLOB, usually a few kilobytes. There are unique indexes for both (z,x,y) and ID. There is a VIEW
that joins both tables.
For ~30% of coordinates, BLOBs are unique per coordinate combination. The rest reference the same ~100 of frequently occurring BLOBs.
I would like to optimize for space and performance: move unique BLOBs into the first table, and keep the second table only as a small 100-row lookup for the few shared BLOBs. The first table's blob could be checked at run time -- if it is exactly the size of the hash key, treat it as a lookup. Otherwise, treat it as value.
My thinking is that this will often avoid a lookup into a large second table, keep small lookup table fully in cache, and avoid storing keys for most of the blobs. My perf testing does not confirm this theory, and I don't understand why.
Original implementation:
CREATE TABLE map (z INTEGER, x INTEGER, y INTEGER, id TEXT);
CREATE TABLE blobs (id TEXT, data BLOB);
CREATE VIEW tiles AS
SELECT z, x, y, data FROM map JOIN blobs ON blobs.id = map.id;
CREATE UNIQUE INDEX map_index ON map (z, x, y);
CREATE UNIQUE INDEX blobs_id ON blobs (id);
Optimized implementation changes ID column in the map table from id TEXT
to mix BLOB
.
CREATE TABLE map (z INTEGER, x INTEGER, y INTEGER, mix BLOB);
I tried two VIEW implementations, both run slower by ~10% than the INNER JOIN method above. The LEFT JOIN
method:
CREATE VIEW tiles AS
SELECT z, x, y,
COALESCE(blobs.data, map.mix) AS data
FROM map LEFT JOIN blobs ON LENGTH(map.mix) = 32 AND map.mix = blobs.id;
And I tried the sub-query approach:
CREATE VIEW tiles AS
SELECT z, x, y,
CASE
WHEN LENGTH(map.mix) = 32 THEN
(SELECT COALESCE(blobs.data, map.mix) FROM blobs WHERE map.mix = blobs.id)
ELSE map.mix
END AS data
FROM map;
P.S. COALESCE()
ensures that in case my data length happened to be 32, but it is not a foreign key, the query should return data as is.
P.P.S. This is an mbtiles file with map tiles, and duplicate tiles represent empty water and land, whereas unique tiles represent places with some unique features, like city streets)