2

So I'm using Java with Apache Derby, and I'm trying to insert a record, but only if a record with the same key does not already exist, because all the values I want to exist in my code rather than the database I use derbys dummy table (analogous to DUAL for db2) this is the query I'm using (md5 is the primary key)

PreparedStatement stmt = conn.prepareStatement("INSERT INTO artwork (md5,image) " +
               "SELECT ?,?" +
               "FROM SYSIBM.SYSDUMMY1 " +
               "WHERE NOT EXISTS ( SELECT 1 FROM artwork WHERE md5=?)");

    stmt.setString(1, key);
    stmt.setBinaryStream(2, new ByteArrayInputStream(imageData), imageData.length);
    stmt.setString(3, key);

and it seemed to be working, however when I multi-thread the code so that two threads could be trying to insert the same artwork I get the get errors about entering duplicate value into index.

If I synchronize the method so that only one thread can call the method at the same time then I get no such errors but this defeats the purpose of adding the WHERE NOT EXISTS value in the first place.

So is my query not doing what I think it is doing, or am I misunderstanding a general concept here ?

NPE
  • 486,780
  • 108
  • 951
  • 1,012
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
  • Have you tried using transactions? – Romain Jan 09 '12 at 12:47
  • Don't see how that is relavent – Paul Taylor Jan 09 '12 at 12:49
  • I don't see how transactions will help in this situation. To the OP: is there anything wrong with just allowing the duplicate value errors to happen and ignoring them? It would accomplish the same purpose. – ean5533 Jan 09 '12 at 12:52
  • ... or not allowing them and handling exceptions?? – soulcheck Jan 09 '12 at 12:55
  • Possibly not but I dont understand why my code doesnt work as expected, and i think it is slower for a db to throw exceptions after an invalid insert than to not do the insrt. – Paul Taylor Jan 09 '12 at 12:58
  • 1
    And from @soulcheck's point, you could achieve atomic properties by using transactions (provided Derby goes all the way to SERIALIZABLE, which I'm not sure about). – Romain Jan 09 '12 at 13:04
  • @soulcheck Interesting, did not know that. Thanks. – ean5533 Jan 09 '12 at 13:05
  • @Romain I do set setAutoCommit(false) , and then commit() at the end, is that what you mean by a transactions – Paul Taylor Jan 09 '12 at 13:07
  • @soulcheck that is indeed interesting and suprising, Im sure Ive used this method on other dbs in the past – Paul Taylor Jan 09 '12 at 13:08
  • 1
    @user294896 Yes, though this could end up using an isolation level that is not sufficient (I don't know the default in Derby). – Romain Jan 09 '12 at 13:20
  • I still dont understand why its works single threaded. i.e if one thread tries to add the same row twice the Where clause seems to prevent the insert occurring, and hence the exception being thrown – Paul Taylor Jan 09 '12 at 16:14
  • @soulcheck: of course queries with subqueries are atomic in PostgreSQL (or Oracle, or DB2, ...). It's called "read consistency" and is usually done by implementing "multi version concurrency control" –  Jan 09 '12 at 16:23
  • @user294896 when single-threaded the artwork table contets can't chage in between testing the condition and inserting. In multi-threaded app things can happen in following sequence:1. thread t1 checks the condition, it's true; 2. context switches to thread t2 which checks the condition, it's true again; 3. context switches to t1, which proceeds with insert; 4. context switches to t2, which proceeds with insert and fails – soulcheck Jan 09 '12 at 16:24
  • @a_horse_with_no_name yeah, i really should've said 'not isolated enough', so good point. – soulcheck Jan 09 '12 at 16:35
  • @a_horse_with_no_name they're atomic in the acid sense (meaning they will either succeed or fail), they're not atomic in the sense that condition testing is guaranteed to happen with the same state of data as inserting (meaning that executing a query is actually happening in at least two steps, not one indivisible step). the OP's problem persists. postgres will make sure your query sees consistent data, but still doesn't guarantee there's no phantom reads (which is the case of OP's query) unless instructed to do so. – soulcheck Jan 09 '12 at 16:45
  • @soulcheck: I don't know which DBMS you have used so far but at least in Postgres, Oracle and DB2 the execution of a single statement is a single atomic operation and all "sub-statements" will see the same snapshot of the data (that is the "version" that was valid when the statement started) –  Jan 09 '12 at 17:23
  • @a_horse_with_no_name people on postgres mailing lists seem to disagree, unless i misunderstood them. [here's a post by Tom Lane - a postgres contributor](http://markmail.org/thread/2wex52lapbwlkjo5#query:+page:1+mid:xtwlp7aguq6226fs+state:results). – soulcheck Jan 09 '12 at 18:03
  • @soulcheck: no, Tom is actually supporting my point: "will only see rows that committed before the command started." Command is the *whole* statement here. Because of that the INSERT might fail as in the meantime someone else might have inserted a conflicting row. –  Jan 09 '12 at 18:44
  • @a_horse_with_no_name no he's not. He **explicitly** says it's not atomic. He doesn't argue what does 'atomic' mean, cause he assumes the meaning from context (and that's the same atomic as in atomic test-and-set not ACID-atomic). – soulcheck Jan 09 '12 at 20:58
  • @a_horse_with_no_name, soulcheck: Can we get a third party to weigh in on this argument? You both seem confident in your answers and I'd like to know what the final say is. – ean5533 Jan 10 '12 at 13:43
  • Please consider moving this conversation to chat. – Tim Post Jan 10 '12 at 13:58

1 Answers1

0

INFO: This solution does not work. See the comments for details. I left it here so who ever looks for a solution dosen't have to try this as well.

I haven't had the prodblem myself yet, but I would guess that you could get around it if you optimize the subquery out of it. Like this (I haven't tested it, maybe it needs improvement):

PreparedStatement stmt = conn.prepareStatement("INSERT INTO artwork (md5,image) " +
           "SELECT ?,?" +
           "FROM SYSIBM.SYSDUMMY1 left join artwork exi on ? = md5" +
           "WHERE exi.md5 IS NULL");

stmt.setString(1, key);
stmt.setBinaryStream(2, new ByteArrayInputStream(imageData), imageData.length);
stmt.setString(3, key);
Angelo Fuchs
  • 9,825
  • 1
  • 35
  • 72
  • Hi thanks just tried it and it didnt work. I think if Ive understood the comments above the problem is not the use of a subquery but the use of a query to supply data to the insert, and anothe thread can add in the row inbetween the first thread doing the SELECT and doing the INSERT based on the SELECT. – Paul Taylor Jan 09 '12 at 16:41
  • @user294896 you're correct it has exactly the same problem as the original query. – soulcheck Jan 09 '12 at 16:46
  • @user294896 Okay, I edited a message in my answer to flag this answer as not working. – Angelo Fuchs Jan 12 '12 at 16:37