I'm inserting a lot of CSV data files into remote Clickhouse database that already has a lot of data. I'm doing it using simple script like this:
...
for j in *.csv; do
clickhouse-client --max_insert_threads=32 --receive_timeout=30000 --input_format_allow_errors_num=999999999 --host "..." --database "..." --port 9000 --user "..." --password "..." --query "INSERT INTO ... FORMAT CSV" < "$j"
done
...
So my question is: how to optimize these inserts? I already used these options for optimization:
--max_insert_threads=32 --receive_timeout=30000
Are there any more options in clickhouse-client
I should use for better performance and for what purpose? One file can be like 300-500mb (and sometimes more). According to this article using parallel processes won't help that's why I'm inserting one file at time.