6

LowCardinality fields in ClickHouse are an optimization where the values are dictionary-encoded for faster lookups and smaller storage. As per documentation:

The efficiency of using LowCarditality data type depends on data diversity. If a dictionary contains less than 10,000 distinct values, then ClickHouse mostly shows higher efficiency of data reading and storing. If a dictionary contains more than 100,000 distinct values, then ClickHouse can perform worse in comparison with using ordinary data types.

What about UInt8 values used as Boolean? Cardinality is 2, but with such small footprint(8), would it actually provide a benefit in queries to use it?

Daniel Heilper
  • 1,182
  • 2
  • 17
  • 34
xmar
  • 1,729
  • 20
  • 48
  • 1
    They are *not* “turned into a dictionary” - the values are [dictionary (en)coded](https://en.wikipedia.org/wiki/Dictionary_coder). Even without any dictionary overhead, the size of encoded data can be no less than the size of the dictionary key (and any additional overhead). Note that https://clickhouse.tech/docs/en/operations/settings/settings/ has a threshold of 8 bytes. – user2864740 Jun 11 '20 at 15:35
  • 1
    You are right, edited my question. – xmar Jun 11 '20 at 15:38
  • Anyway, in the settings, note that allow_suspicious_low_cardinality_types has a threshold of 8 bytes (800% over). It then gives an explanation for multiple reasons. Try it and see? – user2864740 Jun 11 '20 at 15:41

2 Answers2

12

LowCardinality has sense mostly for String type.

LowCardinality(UInt8) is always worse than UInt8.

There are very rare cases where LowCardinality makes sense for numeric types. But I would not even test it because it wasting of time. Pointer to a LC dictionary takes (Int8-Int32) in a .bin file so it's cheaper in disk space and CPU to store numeric value itself in .bin file.

xmar
  • 1,729
  • 20
  • 48
Denny Crane
  • 11,574
  • 2
  • 19
  • 30
2

According to https://clickhouse.com/docs/en/whats-new/changelog/#new-feature ClickHouse now supports a native Bool type. It is essentially a UInt8 restricted to the values 0 and 1, but it will also serialize to and from true/false in formats such as JSON.

rjh
  • 49,276
  • 4
  • 56
  • 63