I try to load a bunch of data from my SCADA System to CrateDB (4 Years, multiple CSV files, about 87GB, some billions of datapoints/rows).
Writing of data is currently slow. How can i improve the speed of inserting rows? I use the crate JDBC driver and i already use JDBC bulk inserts.
On my system i can only load about 1500 values per seconds (8GB RAM, 4GB Heap, RAID 10 with 5x7k Disks)
On the same machine with InfluxDB it is possible to load (with the same client program, but not with JDBC) about 80000 values per second!
I do not expect 80kHz with Crate, but hopefully more than 1.5kHz, 20kHz would be acceptable. It will take some days or weeks to load the data.
My DB table looks like that:
CREATE TABLE EVENTHISTORY (
tag string NOT NULL,
ts TIMESTAMP NOT NULL,
value_number double INDEX OFF,
value_string string INDEX OFF,
value_timestamp TIMESTAMP INDEX OFF,
status long INDEX OFF,
manager integer INDEX OFF,
user_ integer INDEX OFF,
primary key (tag, ts)
);
I tried to remove the primary key, but it doesn't matter.
I use multiple Threads (4 to 16, no difference) to write data in 8k Bulks (one bulk=8096 rows).
For threading I use JDBC connection pooling org.apache.commons.dbcp2.BasicDataSource;
How can I improve the writing speed of CrateDB?