0

Can anyone suggest how I might INSERT batched queries into an Apache Derby DB whilst still handling duplicate entries with good performance?

As a last resort to try and balance performance with the ability to handle duplicate keys on INSERT, I decided to let JDBC / Derby tell me when a batched query would result in a duplicate key, catch the error, and then continue with the remaining batched queries.

Unfortunately, as soon as an error is detected, it seems to clear out the whole batch of queries which means that I lose all of the remaining records.

Things I've tried:

  • Perform a programmatic query to determine if the entry is already in there. Abandoned this because the data may currently be in a batch, so it will be missed.
  • Use SQL queries to only INSERT if record is not already present. Abandoned this because it resulted in awful INSERT performance (down by almost 70%).
  • Let the DB tell me about duplicate keys, catch the error and continue processing batch. About to abandon this because it 'forgets' that there are other queries to process in the batch.

If only Derby supported INSERT IGNORE.

Thanks

jdie8274j
  • 155
  • 1
  • 10

1 Answers1

3

The JDBC says that it a driver may continue to execute batch statements after an error occurs:

If one of the commands in a batch update fails to execute properly, this method throws a BatchUpdateException, and a JDBC driver may or may not continue to process the remaining commands in the batch.

Now the derby docs state that in embedded mode batch processing stops if an error is encountered, whereas the network driver continues:

Treatment of error situations encountered during batch processing with java.sql.Statement, java.sql.PreparedStatement and java.sql.CallableStatement is different. With the embedded driver processing stops when an error is encountered; with the network client driver processing continues, but an appropriate value as defined in the java.sql.Statement api is returned in the resulting update count array.

But given that you want to perform a INSERT IGNORE you could instead simply use a MERGE statement which is supported by Derby.

wero
  • 32,544
  • 3
  • 59
  • 84
  • Hi @Wero, thank you for your response. Are you suggesting that I could run the application in network mode and connect on localhost? I looked at `merge`, but it doesn't seem to support my use case because it requires a source table. In two of my tables, I am inserting from application data and then in my lookup table, I am inserting based on rows in the other two. – jdie8274j Apr 22 '16 at 09:04
  • @jdie8274j you could try the network mode and see if it works. And wouldn't `(SELECT as id FROM dual)` work as source table? – wero Apr 22 '16 at 09:27
  • I don't think `dual` is supported by Derby unfortunately! I'm also trying to avoid `SELECT` queries because I'm inserting a lot of data, and I want to do it quickly. Hence my disappointment when I found that simply catching a duplicate key exception wouldn't work! – jdie8274j Apr 22 '16 at 09:33
  • @jdie8274j funny, in derby `dual` is `sysibm.sysdummy1`, http://stackoverflow.com/questions/20262352/create-table-table1-as-select-from-sysibm-sysdummy1 – wero Apr 22 '16 at 09:38
  • Well @wero, your `merge` idea is looking good for the two tables. Very good performance too. Do you have any idea how I can `merge insert` into a lookup table using the id's from the other two? – jdie8274j Apr 22 '16 at 10:15
  • `merge` is not ideal for batch inserts because it requires two tables whereas you might only have a `List` in many cases – Hooli Jul 29 '16 at 21:44