0

I'd like to insert huge amounts of data, What should I use: Single insert into statements, or do I have to use bulk inserts? Is there something else? The reason I ask is, that my CrateDB node's disk is only busy at 11kb/s on average while the disk load is at 100% using single inserts!

Furthermore, is something like INSERT INTO IGNORE supported? Can I just throw my data in bulk at CrateDB and it will ignore duplicate entries?

Thanks!

Cœur
  • 37,241
  • 25
  • 195
  • 267
claus
  • 377
  • 2
  • 9

1 Answers1

0

So as you rightly guessed, bulk inserts give you the best performance. However the experience might vary - which mostly depends on the chosen "bulk size", i.e. how many records are sent at once. Usually a batch of 1000 records performs very well, but it's recommended to play around a bit since this might be specific to the hardware CrateDB runs on.

Bulk inserts will also skip duplicate inserts automatically - if you have a primary key defined on that table (how else would the DB know what's a duplicate?). This comes at a performance impact (needless lookup/failed insert) though...

Depending on what you want to achieve, you should consider using insert or update

claus
  • 377
  • 2
  • 9