Let's say we log events in a Sqlite database with Unix timestamp column ts
:
CREATE TABLE data(ts INTEGER, text TEXT); -- more columns in reality
and that we want fast lookup for datetime ranges, for example:
SELECT text FROM data WHERE ts BETWEEN 1608710000 and 1608718654;
Like this, EXPLAIN QUERY PLAN
gives SCAN TABLE data
which is bad, so one obvious solution is to create an index with CREATE INDEX dt_idx ON data(ts)
.
Then the problem is solved, but it's rather a poor solution to have to maintain an index for an already-increasing sequence / already-sorted column ts
for which we could use a B-tree search in O(log n) directly. Internally this will be the index:
ts rowid
1608000001 1
1608000002 2
1608000012 3
1608000077 4
which is a waste of DB space (and CPU when a query has to look in the index first).
To avoid this:
(1) we could use
ts
asINTEGER PRIMARY KEY
, sots
would be therowid
itself. But this fails becausets
is not unique: 2 events can happen at the same second (or even at the same millisecond).See for example the info given in SQLite Autoincrement.
(2) we could use
rowid
as timestampts
concatenated with an increasing number. Example:16087186540001 16087186540002 [--------][--] ts increasing number
Then
rowid
is unique and strictly increasing (provided there are less than 10k events per second), and no index would be required. A queryWHERE ts BETWEEN a AND b
would simply becomeWHERE rowid BETWEEN a*10000 AND b*10000+9999
.But is there an easy way to ask Sqlite to
INSERT
an item with arowid
greater than or equal to a given value? Let's say the current timestamp is1608718654
and two events appear:CREATE TABLE data(ts_and_incr INTEGER PRIMARY KEY AUTOINCREMENT, text TEXT); INSERT INTO data VALUES (NEXT_UNUSED(1608718654), "hello") #16087186540001 INSERT INTO data VALUES (NEXT_UNUSED(1608718654), "hello") #16087186540002
More generally, how to create time-series optimally with Sqlite, to have fast queries WHERE timestamp BETWEEN a AND b
?