3
create database test

create table test.A (id UInt8, data String) engine MergeTree() order by (id)

create table test.B (id UInt8, data String) engine MergeTree() order by (id)

insert into test.A values (0,'a'),(1,'b'),(2,'c')

insert into test.B values (1,'x'),(2,'y'),(3,'z')
select *, isNull(a.id), isNull(b.id) from test.A a full join test.B b on a.id = b.id

enter image description here

How to understand if 0 is 0 or 0 is Null?

Alexey Golyshev
  • 792
  • 6
  • 11
  • Does `is null` have the same behavior? It is also strange that the ids are showing up as `0` rather than `NULL` (or blank). – Gordon Linoff Apr 29 '21 at 10:57
  • I don't know ClickHouse, but it seems `Null` values for `UInt8` datatype are being defaulted to `0`? Which is then not equal to `Null`, so `isNull` returns `0`(== "false")? Just my impression. Have you tried the same for the String Columns? I'd expect different behavior, there. – Fildor Apr 29 '21 at 10:58
  • Hello @GordonLinoff. Yes, the same behavior. – Alexey Golyshev Apr 29 '21 at 10:59
  • Hello @Fildor. For `String` I will receive "" but not Null. – Alexey Golyshev Apr 29 '21 at 11:02
  • 1
    By default CH JOIN and other functions does not produce Nulls. Because CH did not support Nulls in early versions and because Nullable values make CH twice slower. https://clickhouse.tech/docs/en/operations/settings/settings/#aggregate_functions_null_for_empty https://clickhouse.tech/docs/en/operations/settings/settings/#join_use_nulls – Denny Crane Apr 29 '21 at 12:32

1 Answers1

4

The issue is how the outer join is processed. According to the documentation, this uses a setting join_use_nulls:

Sets the type of JOIN behaviour. When merging tables, empty cells may appear. ClickHouse fills them differently based on this setting.

Possible values:

0 — The empty cells are filled with the default value of the corresponding field type. 1 — JOIN behaves the same way as in standard SQL. The type of the corresponding field is converted to Nullable, and empty cells are filled with NULL.

Default value: 0.

In other words, the default approach is to be non-SQL compliant. Personally, I see no use for such a default. I guess there is a strong aversion to NULL values at Clickhouse.

So, you can instead check a.id = b.id to see if there is a match.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 2
    CH uses SSE/CPU instructions for calculations for columns and other optimization. But Null does not fit into a bit space of Int64/Int16 ... And CPUs don't know how to process Nulls, SO Nullable types make queries twice slower. And there is no good/easy way to fix it. – Denny Crane Apr 29 '21 at 12:37
  • 1
    Also for backward compatibility. 5 years ago CH did not support Nulls for joins. – Denny Crane Apr 29 '21 at 12:40