0

I'm using Apache Beam Java SDK to process events and write them to the Clickhouse Database. Luckily there is ready to use ClickhouseIO.

ClickhouseIO accumulates elements and inserts them in batch, but because of the parallel nature of the pipeline it still results in a lot of inserts per second in my case. I'm frequently receiving "DB::Exception: Too many parts" or "DB::Exception: Too much simultaneous queries" in Clickhouse.

Clickhouse documentation recommends doing 1 insert per second. Is there a way I can ensure this with ClickhouseIO? Maybe some KV grouping before ClickhouseIO.Write or something?

1 Answers1

1

It looks like you interpret these errors not quite correct:

DB::Exception: Too many parts

It means that insert affect more partitions than allowed (by default this value is 100, it is managed by parameter max_partitions_per_insert_block).

So either the count of affected partition is really large or the PARTITION BY-key was defined pretty granular.

How to fix it:

  • try to group the INSERT-batch such way it contains data related to less than 100 partitions
  • try to reduce the size of insert-block (if it quite huge) - withMaxInsertBlockSize
  • increase the limit max_partitions_per_insert_block in SQL-query (like this, INSERT .. SETTINGS max_partitions_per_insert_block=300 (I think ClickhouseIO should have the ability to set custom options on query level)) or on server-side by modifying userprofile-settings

DB::Exception: Too much simultaneous queries

This one managed by param max_concurrent_queries.

How to fix it:

vladimir
  • 13,428
  • 2
  • 44
  • 70
  • Thank you for your answer! From what I've seen in the following discussions: https://github.com/ClickHouse/ClickHouse/issues/3174 https://github.com/ClickHouse/ClickHouse/issues/4050 "DB::Exception: Too many parts" can be caused by the too frequent inserts. Actually I'm interested in exactly what you've mentioned "reduce the count of concurrent queries by Beam means". How to reduce the count of concurrent queries with Apache Beam using ClickhouseIO? – Oleksandr Bushkovskyi Mar 12 '20 at 09:40
  • 1
    Sorry, I am not a specialist in Apache Beam and can only suppose as it works - probably exists the means to manage the 'count of writer-workers' or smth like this. Better to asks for Beam-community about it and the author of [ClickhouseIO](https://github.com/apache/beam/pulls?q=clickhouseio). – vladimir Mar 12 '20 at 20:29