0

I have an environment with (for simplicity) two tables A and B. Table B has a foreign key constraint on Table A's primary key. I now have several forked processes each performing a transaction like the following:

  1. Load data into local memory (lets say user input)
  2. Select all rows (~1000) from table A
  3. Compute both set differences of received data and data from local memory as well as set intersection with local memory.
  4. Delete references from table B if corresponding row from table A is not present in local memory
  5. Delete rows from table A if they are no longer referenced from table B and not present in local memory
  6. Update rows from tables A and B present in memory with data from memory
  7. Insert new rows into tables A and B for data only present locally

As mentioned table A has roughly 1000 rows and table B 5000. Each transaction touches around 30% of the entries.

The reason I calculate the set differences myself is that I cannot use a MERGE or UPSERT since I don't know which DBMS I will be talking to (abstracted away by company's DB layer), Oracle or PostreSQL.

The problem I see when two such transactions run in parallel is:

  • Both read the current state, see that an entry with primary key X does not yet exist in Table A and both try to insert it. Who ever commits last will fail.
  • Both read the current state and delete a reference from Table B leaving a row in Table A without any reference left. Thus it would need to be deleted. But since both transactions still see the reference of the other the row in Table A would never be deleted, leaving a dead row.
  • Both read the current state. Transaction 1 sees a row in Table A without any reference and deletes it. Transaction 2 sees the same row as already present and adds just a reference to it. Transaction 1 commits. Commit of transaction2 fails due to FK violation.

I think the only isolation level that would help me is SERIALIZABLE but I cannot handle any serialization error since the company's DB layer does not provide proper error information on why a transaction failed. But I don't see that any table locking would help me because my problem is the (inconsistent) read of Table A at the beginning of each transaction and reads are never blocked.

How can I best solve my problem?

sigy
  • 2,408
  • 1
  • 24
  • 55
  • "select all rows from table A". How many could that possibly be 1? 100? 100000000000? – Rene Aug 22 '16 at 11:59
  • create a table with one row and use `select ... for update` on this row to enter critical section. With limitations, you have, there is no effective, scaleable solution. – ibre5041 Aug 22 '16 at 12:04
  • How many rows are loaded into local memory? How big are tables A and B? Do they have primary keys? What percentage of of the PK values in A and B will be in any one transaction's local memory? I.e, is each transaction going to affect 50% of the rows? 5%? 0.05%? What's the order of magnitude? You need to serialize in your solution to avoid the problems you cited, but depending on the answers to the above questions, you may be able to easily design things to let a large bulk of the work happen in parallel. – Matthew McPeak Aug 22 '16 at 12:37
  • Good question...I dont know the answers myself yet. I will try to find this out and update my question once I did – sigy Aug 22 '16 at 12:42
  • @MatthewMcPeak Table A has roughly 1000 rows, table B around 3000. table A has a primary key, table B doesn't. Each transaction should probably touch around 30% of the rows. – sigy Aug 29 '16 at 13:32
  • In that case, I'd go with @ibre5041: create a dummy table with one record and `SELECT..FOR UPDATE` that one record before the transaction. That will serialize all the transactions. If each transaction affected a smaller percentage of the rows, there are more sophisticated variants of that which would allow more or most of the transactions to work in parallel. But with each transaction hitting 30% of the rows and the updates being so small (300-1K rows), I think you'll be better off keeping it simple. At 30%, you're not likely to get more than 2-3 transactions in parallel no matter what. – Matthew McPeak Aug 29 '16 at 14:41

0 Answers0