-1

I'm inserting a lot of CSV data files into remote Clickhouse database that already has a lot of data. I'm doing it using simple script like this:

...
 for j in *.csv; do
      clickhouse-client --max_insert_threads=32 --receive_timeout=30000 --input_format_allow_errors_num=999999999 --host "..." --database "..." --port 9000 --user "..." --password "..." --query "INSERT INTO ... FORMAT CSV" < "$j"
  done
...

So my question is: how to optimize these inserts? I already used these options for optimization:

--max_insert_threads=32 --receive_timeout=30000

Are there any more options in clickhouse-client I should use for better performance and for what purpose? One file can be like 300-500mb (and sometimes more). According to this article using parallel processes won't help that's why I'm inserting one file at time.

Arzybek
  • 547
  • 2
  • 6
  • 27

1 Answers1

1

max_insert_threads is not applicable here, it's about insert select inside CH server.

According to this article using parallel processes won't help

It should help (it depends on CPU and disk power), just try

# parallelism 6 (-P6)

find . -type f -name '*.csv' | xargs -P 6 -n 1 clickhouse-client --input_format_parallel_parsing=0 --receive_timeout=30000 --input_format_allow_errors_num=999999999 --host "..." --database "..." --port 9000 --user "..." --password "..." --query "INSERT INTO ... FORMAT CSV"

I set input_format_parallel_parsing=0 deliberately, it improves total performance in case of multiple parallel loads.

Denny Crane
  • 11,574
  • 2
  • 19
  • 30
  • Yeah, already did, it helped a little bit, but not that much, I will try this option you suggest, maybe it will help. What about max_insert_threads, I should delete it then if it won't help? – Arzybek Jan 25 '22 at 14:39
  • you can leave max_insert_threads it means nothing in this case. – Denny Crane Jan 25 '22 at 14:48
  • How many CPU cores and what disks do you use? With powerful servers -P6 increases performance linearly - 6 times. – Denny Crane Jan 25 '22 at 14:50
  • Also try `--compress=1` argument or `--compress=0` – Denny Crane Jan 25 '22 at 14:51
  • I didn't find any information about ```--compress``` flag, can you tell what does it do or where to find information about it? – Arzybek Jan 25 '22 at 20:32
  • `clickhouse-client --help|grep compre --compression arg enable or disable compression --network_compression_method arg Allows you to select the method of data compression when writing. --network_zstd_compression_level arg Allows you to select the level of ZSTD compression. ` it compress network traffic between a client and a server – Denny Crane Jan 25 '22 at 21:53
  • Thanks a lot, last question, I'm using 9000 port, just to make sure, what's better - http (8123) or 9000? – Arzybek Jan 26 '22 at 06:07
  • 9000 is TCP. 8123 is HTTP. No better or worse, they're just different. clickhouse-client able to use only 9000 (TCP), other libraries/applications only HTTP (for example official JDBC driver) – Denny Crane Jan 26 '22 at 15:43
  • When trying this import I get: Code: 36. DB::Exception: Positional options are not supported. (BAD_ARGUMENTS) – Simon Knight Feb 06 '23 at 12:42