0

In my Clickhouse server, I have a table with an Array of Integers field:

CREATE TABLE my_table
(
  ...
  my_array_field Array(UInt32),
  ...

)

Pretty simple definition.

But now I want to filter the records matching a condition like this:

Any of the itens inside my_array_field is between 5000 and 6000


In postgres I was doing something like this:

SELECT * FROM my_table WHERE EXISTS 
(
  SELECT 1 FROM unnest(my_array_field) AS my_array_field WHERE my_array_field BETWEEN 5000 AND 6000
)

But now I have the same challenge in clickhouse and need some help.

Thanks in advance!

Mauricio Moraes
  • 7,255
  • 5
  • 39
  • 59

2 Answers2

2
select * from my_table 
where arrayExists( i -> (i > 5000 and i < 6000) , my_array_field)
Denny Crane
  • 11,574
  • 2
  • 19
  • 30
  • I used your answer in my code! Thanks a lot. But in my case it was failing because the array can be empty. So I had to add: `WHERE notEmpty(my_array_field) AND arrayExists(x ->...` If you could add that to your answer, that would be even more awesome. – Mauricio Moraes Mar 05 '21 at 22:59
  • ???? it's impossible. arrayExists returns 0 for emtpy arrays. `notEmpty(my_array_field) AND` has no sense. – Denny Crane Mar 06 '21 at 00:38
1

Try to use arrayJoin function:

SELECT
    *,
    arrayJoin(my_array_field) AS val
FROM my_table
WHERE val BETWEEN 5000 AND 6000

Look for details ARRAY JOIN Clause.


If it experienced some performance issues consider adding two extra columns to store max and min values in array:

CREATE TABLE my_table
(
  ...
  my_array_field Array(UInt32),
  UInt32 min_my_array_field,
  UInt32 max_my_array_field,
  ...
)

INSERT INTO my_table
VALUES (.., arr, arrayMin(arr), arrayMax(arr), ..)
SELECT
    *,
    arrayJoin(my_array_field) AS val
FROM
(
    SELECT *
    FROM my_table
    WHERE (min_my_array_field <= 6000) AND (max_my_array_field >= 5000)
)
WHERE val BETWEEN 5000 AND 6000

Look at skipping indexes : for example bloom_filter supports Array-type.

vladimir
  • 13,428
  • 2
  • 44
  • 70