According to ClickHouse documentation, when reading a single range of the primary key, up to index_granularity * 2
extra rows in each data block can be read.
Why index_granularity * 2
extra rows? I can't figure this out.
Take the
(CounterID, Date)
primary key as an example. In this case, the sorting and index can be illustrated as follows:Whole data: [---------------------------------------------] CounterID: [aaaaaaaaaaaaaaaaaabbbbcdeeeeeeeeeeeeefgggggggghhhhhhhhhiiiiiiiiikllllllll] Date: [1111111222222233331233211111222222333211111112122222223111112223311122333] Marks: | | | | | | | | | | | a,1 a,2 a,3 b,3 e,2 e,3 g,1 h,2 i,1 i,3 l,3 Marks numbers: 0 1 2 3 4 5 6 7 8 9 10
If the data query specifies:
CounterID in ('a', 'h')
, the server reads the data in the ranges of marks[0, 3)
and[6, 8)
.CounterID IN ('a', 'h') AND Date = 3
, the server reads the data in the ranges of marks[1, 3)
and[7, 8)
.Date = 3
, the server reads the data in the range of marks[1, 10]
.The examples above show that it is always more effective to use an index than a full scan.
A sparse index allows extra data to be read. When reading a single range of the primary key, up to
index_granularity * 2
extra rows in each data block can be read.