23

Let's say we log events in a Sqlite database with Unix timestamp column ts:

CREATE TABLE data(ts INTEGER, text TEXT);   -- more columns in reality

and that we want fast lookup for datetime ranges, for example:

SELECT text FROM data WHERE ts BETWEEN 1608710000 and 1608718654;

Like this, EXPLAIN QUERY PLAN gives SCAN TABLE data which is bad, so one obvious solution is to create an index with CREATE INDEX dt_idx ON data(ts).

Then the problem is solved, but it's rather a poor solution to have to maintain an index for an already-increasing sequence / already-sorted column ts for which we could use a B-tree search in O(log n) directly. Internally this will be the index:

ts           rowid
1608000001   1
1608000002   2
1608000012   3
1608000077   4

which is a waste of DB space (and CPU when a query has to look in the index first).

To avoid this:

  • (1) we could use ts as INTEGER PRIMARY KEY, so ts would be the rowid itself. But this fails because ts is not unique: 2 events can happen at the same second (or even at the same millisecond).

    See for example the info given in SQLite Autoincrement.

  • (2) we could use rowid as timestamp ts concatenated with an increasing number. Example:

     16087186540001      
     16087186540002
     [--------][--]
         ts     increasing number 
    

    Then rowid is unique and strictly increasing (provided there are less than 10k events per second), and no index would be required. A query WHERE ts BETWEEN a AND b would simply become WHERE rowid BETWEEN a*10000 AND b*10000+9999.

    But is there an easy way to ask Sqlite to INSERT an item with a rowid greater than or equal to a given value? Let's say the current timestamp is 1608718654 and two events appear:

      CREATE TABLE data(ts_and_incr INTEGER PRIMARY KEY AUTOINCREMENT, text TEXT);
      INSERT INTO data VALUES (NEXT_UNUSED(1608718654), "hello")  #16087186540001 
      INSERT INTO data VALUES (NEXT_UNUSED(1608718654), "hello")  #16087186540002
    

More generally, how to create time-series optimally with Sqlite, to have fast queries WHERE timestamp BETWEEN a AND b?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • 2
    You are thinking too hard about micro-optimizations. There are probably other places where you could expend your efforts that would have better pay-back. The canonical solution is to use an index or, perhaps, to use the column as a primary key (but you might want the value in milliseconds or microseconds). – Gordon Linoff Dec 23 '20 at 13:09
  • 1
    . . (1) How much overhead the index has relative to the data depends on the size of the row. (2) Databases are simply not space-efficient. In general, custom code is faster and smaller. Of course, to get the functionality and reliability of a database, your team may have to spend decades or centuries writing the code. Databases are *tools*. – Gordon Linoff Dec 23 '20 at 15:03
  • 2
    You might look into the [RTree extension](https://sqlite.org/rtree.html). – Shawn Dec 23 '20 at 17:15
  • @GordonLinoff See my current answer (which might be improved further maybe?): we have a -44% database size improvement (with the same time-range query speed!) by avoiding to use an additional index, so it's worth investigating. – Basj Dec 27 '20 at 20:28
  • @Shawn Would you have an example with RTree? I've started playing with it, but it has a few requirements: `CREATE VIRTUAL TABLE data USING rtree(id INTEGER PRIMARY KEY, dt INTEGER, label INTEGER); INSERT INTO data(dt, label) VALUES (1600000000, 2);` doesn't work, we have to work with pairs `(min, max)` values instead, etc. which pairs would you use here? – Basj Dec 27 '20 at 21:29

2 Answers2

15

First solution

The method (2) detailed in the question seems to work well. In a benchmark, I obtained:

  • naive method, without index: 18 MB database, 86 ms query time
  • naive method, with index: 32 MB database, 12 ms query time
  • method (2): 18 MB database, 12 ms query time

The key point is here to use dt as an INTEGER PRIMARY KEY, so it will be the row id itself (see also Is an index needed for a primary key in SQLite?), using a B-tree, and there will not be another hidden rowid column. Thus we avoid an extra index which would make a correspondance dt => rowid: here dt is the row id.

We also use AUTOINCREMENT which internally creates a sqlite_sequence table, which keeps track of the last added ID. This is useful when inserting: since it is possible that two events have the same timestamp in seconds (it would be possible even with milliseconds or microseconds timestamps, the OS could truncate the precision), we use the maximum between timestamp*10000 and last_added_ID + 1 to make sure it's unique:

 MAX(?, (SELECT seq FROM sqlite_sequence) + 1)

Code:

import sqlite3, random, time
db = sqlite3.connect('test.db')
db.execute("CREATE TABLE data(dt INTEGER PRIMARY KEY AUTOINCREMENT, label TEXT);")

t = 1600000000
for i in range(1000*1000):
    if random.randint(0, 100) == 0:  # timestamp increases of 1 second with probability 1%
        t += 1
    db.execute("INSERT INTO data(dt, label) VALUES (MAX(?, (SELECT seq FROM sqlite_sequence) + 1), 'hello');", (t*10000, ))
db.commit()

# t will range in a ~ 10 000 seconds window
t1, t2 = 1600005000*10000, 1600005100*10000  # time range of width 100 seconds (i.e. 1%)
start = time.time()
for _ in db.execute("SELECT 1 FROM data WHERE dt BETWEEN ? AND ?", (t1, t2)): 
    pass
print(time.time()-start)

Using a WITHOUT ROWID table

Here is another method with WITHOUT ROWID which gives a 8 ms query time. We have to implement an auto-incrementing id ourself, since AUTOINCREMENT is not available when using WITHOUT ROWID.
WITHOUT ROWID is useful when we want to use a PRIMARY KEY(dt, another_column1, another_column2, id) and avoid to have an extra rowid column. Instead of having one B-tree for rowid and one B-tree for (dt, another_column1, ...), we'll have just one.

db.executescript("""
    CREATE TABLE autoinc(num INTEGER); INSERT INTO autoinc(num) VALUES(0);

    CREATE TABLE data(dt INTEGER, id INTEGER, label TEXT, PRIMARY KEY(dt, id)) WITHOUT ROWID;
    
    CREATE TRIGGER insert_trigger BEFORE INSERT ON data BEGIN UPDATE autoinc SET num=num+1; END;
    """)

t = 1600000000
for i in range(1000*1000):
    if random.randint(0, 100) == 0: # timestamp increases of 1 second with probabibly 1%
        t += 1
    db.execute("INSERT INTO data(dt, id, label) VALUES (?, (SELECT num FROM autoinc), ?);", (t, 'hello'))
db.commit()

# t will range in a ~ 10 000 seconds window
t1, t2 = 1600005000, 1600005100  # time range of width 100 seconds (i.e. 1%)
start = time.time()
for _ in db.execute("SELECT 1 FROM data WHERE dt BETWEEN ? AND ?", (t1, t2)): 
    pass
print(time.time()-start)

Roughly-sorted UUID

More generally, the problem is linked to having IDs that are "roughly-sorted" by datetime. More about this:

All these methods use an ID which is:

[---- timestamp ----][---- random and/or incremental ----]
Basj
  • 41,386
  • 99
  • 383
  • 673
  • These are indeed "naive" methods. Just open `file::memory:` ... – Martin Zeitler Jan 02 '21 at 04:23
  • I usually use `:memory:` database for my tests, but here I simulate a real situation (on-disk database) @MartinZeitler. – Basj Jan 02 '21 at 05:07
  • 1
    Then you probably shouldn't complain about the bottle-neck involved. Unless resorting an ascending index, performance might not improve much. Instead of sorting that all into the same table, it might help to create an optimized table per series. This still can be loaded into memory... while such combined indexes seem rather crappy to me. Simplicity always equals performance... and if the record-set is not simple enough, it should be simplified. – Martin Zeitler Jan 02 '21 at 05:18
  • You are adding random data in your database so your result set differs every time you run this script. You cannot get reliable timing with this approach, this is comparing apples to oranges. Plus, using a transaction would significantly reduce the duration of the INSERT statements. In addition I completely agree to the "simplicity equals performance" statement from @MartinZeitler - don't use composite data in a single column. That is bad data modelling practice. – schlamar Jan 14 '23 at 11:17
  • BTW I have ported your example to Rust which shows a performance around ~700µs for the query. So I think you are mainly benchmarking Python iterating the result set and not the query time of SQLite. https://gist.github.com/schlamar/d500bec281c22fe6d3a8aee702e0b28d – schlamar Jan 14 '23 at 11:47
  • I just checked the "naive" approach without index with my Rust implementation and a fixed result set and the performance is much worse than your results indicate. Query time is ~50ms compared to ~800µs from method (2), so over 60x slower! – schlamar Jan 14 '23 at 13:09
  • Plus, you shouldn't ignore insert times. The WITHOUT ROWID approach is ~4x slower on inserting data than naive with index. – schlamar Jan 14 '23 at 13:20
  • @schlamar Can you post a new answer with all your comments + maybe some code you tested? It would be useful and interesting! (easier to read in a new answer than these comments). – Basj Jan 15 '23 at 21:24
  • I don't think my comments qualify as an answer. I uploaded my test code to Github if that helps: https://github.com/schlamar/rusqlite-performance-tests – schlamar Jan 22 '23 at 17:16
  • @schlamar I think your comments + link to your Github qualify as an answer. At least a summary of your findings will be useful to future readers. Even for us when we'll read this topic in a few months/years, a summary will be useful. – Basj Jan 22 '23 at 20:20
7

I am not expert in SqlLite, but have worked with databases and time series. I have hade similar situation previously, and I would share my conceptual solution.

You have some how part of the answer in your question, but not the way of doing it.

The way I did it, creating 2 tables, one table (main_logs) will log time in seconds incrementation as date as integer as primary key and the other table logs contain all logs (main_sub_logs) that made in that particular time that in your case can be up to 10000 logs per second in it. The main_sub_logs has reference to main_logs and it contain for each log second and X number of logs belong to that second with own counter id, that starts over again.

This way you limit your time series look up to seconds of event windows instead of all logs in one place.

This way you can join those two tables and when you look up from in first table between 2 specific time you get all logs in between.

So what here is how I created my 2 tables:

CREATE TABLE IF NOT EXISTS main_logs (
  id INTEGER PRIMARY KEY
);

CREATE TABLE IF NOT EXISTS main_sub_logs (
   id INTEGER,
   ref INTEGER,
   log_counter INTEGER,
   log_text text,
   PRIMARY KEY (id), 
   FOREIGN KEY (ref) REFERENCES main_logs(id)
)

I have inserted some dummy data:

enter image description here

Now lets query all logs between 1608718655 and 1608718656

SELECT * FROM main_logs AS A
JOIN main_sub_logs AS B ON A.id == B.Ref
WHERE A.id >= 1608718655 AND A.id <= 1608718656

Will get this result:

enter image description here

Maytham Fahmi
  • 31,138
  • 14
  • 118
  • 137
  • 2
    Thank you for your answer and the update! Little questions: 1) In real life cases, how do you INSERT the rows, especially about `log_counter`? It seems we have to manually keep another table with the latest `log_counter` value, don't we? Can you give an example of your INSERT? --2) Since you use two tables (so there are two B-trees structure internally), what is the benefit of this method compared to just one table + an ordinary index on the unix timestamp column? I have started some benchmarks and I don't currently see how this improves from the "1 table + index on timestamp" solution. – Basj Dec 27 '20 at 20:03