1

I have an SQLite DB in my application which is used to store temporal tasks data (events with "begin" and "end" timestamps). I need to optimize the query of tasks overlapping the given time window (including tasks partially or fully overlapping the window). Note that simple query of all tasks fully fitting in window works pretty fast, but since I need also get partially and fully overlapping tasks it's not exactly what I need. In example below I create table named "tasks" and complex index "tasks_index" by "begin" and "end" columns.

CREATE TABLE "tasks" ( "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "begin" BIGINT NOT NULL, "end" BIGINT NOT NULL )
CREATE INDEX "tasks_index" ON "tasks" ( "begin", "end" )

Tasks table:

----+-------+-------+
 id | begin | end   |
----+-------+-------+
 1  | 2     | 6     |
 2  | 9     | 14    |
 3  | 16    | 22    |
 4  | 24    | 28    |
 5  | 30    | 34    |
 6  | 36    | 42    |
 7  | 44    | 50    |
 8  | 9     | 27    |
 9  | 29    | 42    |
 10 | 4     | 47    |

Tasks on time-line:

   [-------------------10---------------------]
        [--------8--------] [-----9------]
 [-1-]  [--2-] [--3--] [-4-] [-5-] [--6--] [--7--]
----------(++++++++++++++++++++++++++)-------------->
          11                         21          time

Query with time window from 11 to 21:

SELECT id FROM tasks WHERE end > 11 AND begin < 21
Result: { 2, 3, 4, 5, 6, 8, 9, 10 }

NOTE: tasks 3-5 fully fitting in window, tasks 2, 6, 8, 9 partially overlapping with window and task 10 is fully overlapping the window

PROBLEM: I've noticed that when I have large amount of tasks (~7 millions) and query them using small window located close to the right end of timeline the query becomes extremely slow and it works slower as the window comes closer to the right end. Query plan shows that search of "tasks" is using "tasks_index" only partially using just "begin" column, while the rest of the query seems to be performed without using the index and becomes slow when first part of selection returns large set of data.

EXPLAIN QUERY PLAN
SELECT id FROM tasks WHERE end > 11 AND begin < 21
SEARCH TABLE tasks USING INDEX tasks_index (begin<?) - end is not used from index!

Could you propose a different index or DB structure organization to optimize such kind of queries? If this is some kind of SQLite limitation, could you propose other in-process database engine?

1 Answers1

0

I am not an expert of Sqlite, but what you need is probably the SQLite R*Tree Module

Please refer to this question as well: SQLite and range query

gerichhome
  • 1,872
  • 2
  • 15
  • 21
  • Thank for fast answer! I'll try to use SQLite compiled with R*Tree and will write about performance results. It's a little bit difficult to do because I'm currently using precompiled sqlite from QT distribution. – Evgeny Gorodetsky Mar 28 '18 at 15:27
  • R\*Tree module worked with QT-plugin qsqlite.dll out of the box - no recompilation required. Range query performance is much better compared to the initial attempt. Unfortunately it's problematic to gain performance in queries involving both R\*Tree index and normal indexing - this problem is discussed in [SQLite query using rtree and normal index slow](https://stackoverflow.com/questions/44118661/sqlite-query-using-rtree-and-normal-index-slow) – Evgeny Gorodetsky Mar 29 '18 at 19:54