0

I try to load a bunch of data from my SCADA System to CrateDB (4 Years, multiple CSV files, about 87GB, some billions of datapoints/rows).

Writing of data is currently slow. How can i improve the speed of inserting rows? I use the crate JDBC driver and i already use JDBC bulk inserts.

On my system i can only load about 1500 values per seconds (8GB RAM, 4GB Heap, RAID 10 with 5x7k Disks)

On the same machine with InfluxDB it is possible to load (with the same client program, but not with JDBC) about 80000 values per second!

I do not expect 80kHz with Crate, but hopefully more than 1.5kHz, 20kHz would be acceptable. It will take some days or weeks to load the data.

My DB table looks like that:

CREATE TABLE EVENTHISTORY (
tag string NOT NULL,
ts TIMESTAMP NOT NULL,
value_number double INDEX OFF,
value_string string INDEX OFF,
value_timestamp TIMESTAMP INDEX OFF,
status long INDEX OFF,
manager integer INDEX OFF,
user_ integer INDEX OFF,
primary key (tag, ts)
);

I tried to remove the primary key, but it doesn't matter.

I use multiple Threads (4 to 16, no difference) to write data in 8k Bulks (one bulk=8096 rows).

For threading I use JDBC connection pooling org.apache.commons.dbcp2.BasicDataSource;

How can I improve the writing speed of CrateDB?

Cœur
  • 37,241
  • 25
  • 195
  • 267

1 Answers1

0

Sad to hear that you are struggling with insert speed with CrateDB. At Crate.io we did insert benchmarks up to >800k inserts/sec on bigger clusters - so your issue seems really strange. First, can you send us some infos about your cluster setup like the nr. of nodes, CrateDB version, etc.? This would really help us to reproduce it.

To narrow down your problem and see if this is really a server issue you could try to convert your CSV files into JSON first and then doing a bulk insert using COPY FROM. If it's still slow this ensures that we could investigate there.

  • I just use a single node and the lastest docker pull for testing. I tried the same with Elasticsearch on the same machine and achieved about 15kHz. COPY FROM, thanks, sounds good, hope to find time to do that test. – Andreas Vogler Aug 01 '17 at 21:39