I'm following the instructions on Scaling Out Data Ingestion, with this command:
find . -type f | xargs -n 1 -P 320 sh -c 'echo $0 `copy_to_distributed_table -C $0 table_name`'
My cluster has a master and eight workers, each with two SSDs. The table is spread across 320 shards.
Data loading is taking a very long time. The average insertion rate seems to be about 750k per minute. Is that normal or is there a way to speed it up?
The only thing I can think of is that I have replication enabled. Should that be turned off for loading and then reset?