In my scenario, a process engine allows only one instance of a process execution to be run at the same time, refusing others. Since my application is planned to be deployed to clusters, it is important that decisions taken in a node reflect the global system status. I understand that there are plenties of ways to achieve my goal.
For that we are implementing a database lock leveraging standard ACID transactions. We have a table PROCESS_RUNS where we record current and historical executions. Better say: we use the history table as a shared lock mechanism, relying on the database to handle concurrency. Note: IDs are generated not by means of DB sequences
| ID | PROCESS_ID | STATUS | START_TIME | END_TIME |
|----|------------|---------|------------|----------|
| 1 | PROCESS1 | RUNNING | -- | -- |
| 2 | PROCESS1 | OK | -- | -- |
| 3 | PROCESS2 | RUNNING | -- | -- |
| 4 | PROCESS3 | KO | -- | -- |
Only constraint here is that, given column PROCESS_ID, only one instance of each process may be in RUNNING status. The application handles all the logic (including node power failures) to keep the database consistent. In the example above, PROCESS1 and PROCESS2 are running, so a new instance of PROCESS3 or PROCESS4 can be inserted into running status, but the application must refuse to INSERT
a new instance of PROCESS1.
Consider the following Hibernate code:
@Override
@Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.SERIALIZABLE)
public boolean checkNotRunningAndInsert(String processId, ProcessRun run)
{
DetachedCriteria criteria = forClass(getEntityClass()).add(eq("processId", requireNonNull(processId, "processId is required")))//
.add(eq("status", ProcessRunStatus.RUNNING))//
.setProjection(rowCount());
long activeCount = DataAccessUtils.longResult(getHibernateTemplate().findByCriteria(criteria));
if (activeCount == 0)
{
// OK, return
getHibernateTemplate().save(requireNonNull(run, "run is required"));
return true;
}
else if (activeCount == 1)
{
// OK, process running
return false;
}
else // if (activeCount>1)
{
log.error("Looks like more than 1 instance of the process {} is running. This is a bug", processId);
return false;
}
}
The above code does the following:
SELECT COUNT(*) FROM PROCESS_RUNS WHERE PROCESS_ID = ? AND STATUS = 'RUNNING'
Then if there are no rows it will issue
INSERT INTO PROCESS_RUNS (....,STATUS,....) VALUES (...,'RUNNING',...)
Otherwise it returns with a false
. In case the DBMS refuses to commit the transaction, a PessimisticLockingFailureException
is issued by Spring's @Transactional
and handled in my business code by simply reissuing the transaction a few more times until it exits cleanly. The calling code is synchronized
so when my application runs on a single server I have 100% guarantee that no two threads can access the method at the same time.
The same cannot be said for clustered installations. I have made myself a JUnit experiment proving that on Oracle (and perhaps Oracle only among Mysql, MS SQL and Postgres) the serialization constraint is sometimes not honoured when two threads (locking on different objects) perform the same transaction at once.
From a SERIALIZABLE
isolation event I expect that if two transactions run at the same time either one of them is queued* or one of them is canceled, especially after the first did the SELECT. From textbook, given T1 and T2 the following must be guaranteed to be impossible
T1: SELECT
T2: SELECT
T2: INSERT
T1: INSERT
T1: COMMIT
T2: COMMIT
If I debug the above code and perform steps, thus manually guaranteeing the execution order of the above statements in a few seconds time, Oracle behaves as expected and I have one of the two transactions rolled back with explicit SQLException. If I retry the second transaction, I can see the first having committed and then the code returns false, as it is expected to run
T1: SELECT
T2: SELECT
T2: INSERT
T1: INSERT
T1: COMMIT
T2: ROLLBACK
T3: SELECT
T3: COMMIT EMPTY
Unfortunately, and here comes my question, I have seen that if I let JUnit run in the wild I get unpredictable results. Sometimes my test succeeds after returning false in one of the two threads, often no invocation returns false and I get two rows of same process in RUNNING
status inserted. The result is not predictable, e.g. happens randomly approximately on a 50/50 odds. I can also see the database containing two rows of the process, more than having JUnit assert that only one thread returned with a false
The last violates my application constraints.
Here is a log of Show SQL output in wrong case
Hibernate: delete from PROCESS_RUNS
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Correct case
Hibernate: delete from PROCESS_RUNS
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
14:47:25.441 [pool-2-thread-2] ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper - ORA-08177: impossibile serializzare l'accesso per questa transazione
14:47:25.451 [pool-2-thread-2] ERROR org.hibernate.internal.SessionImpl - HHH000346: Error during managed flush [could not execute statement]
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
I have talked to my team and we discussed about row locking, but in a COUNT statement there is actually no row to lock.
I have tried to amend my code a few times:
- Adding a 100ms sleep after the SELECT seems to give consistency to the wrong behaviour (i.e. I am always getting no false from threads and two rows in DBeaver).
- Re-issuing the COUNT query after INSERT to check that there is exactly one row in RUNNING status adds nothing
- Removing sleep keeps error consistent
- Changing isolation level to READ_UNCOMMITTED requires altering the entire session, possibly with devastating effects
- Changing isolation to READ_COMMITTED is not what I want in my case
I don't know what else to try. I need to fix this by either Java code or Hibernate configuration. I am also required that works with Oracle works also for MS SQL, Mysql and Postgres
Question is: how do I make sure that my code inserts a new row in a table if and only if no other row of the same PROCESS_ID is in status RUNNING
, with regards to Oracle 12c and transactional isolation?
- It has been repeatedly remarked that in reality no database serializes transactions, but rather grants that if both succeed the result of the operation is consistent with serial execution
Edit 1: perhaps Oracle is working as Oracle is designed. If you see the transaction from the SQL point of view, you see only two SELECT
s and then two INSERT
s running on different data. The outcome of a SELECT does not change the system state, none of the results of the SELECT are used in a VALUE
of the INSERT
. So the outcome of serial execution is two inserts. The real problem, which other databases correctly enforce, is that the result of SELECT
is used by caller code to take a decision on whether or not to perform INSERT
, and that is something Oracle DB will never be able to know in advance. I remember my DB2 classes (namely Data Bases 2nd
, not IBM DB2) where a similar issue was discussed about implementation of DBMSes
The above is likely to be true according to research (emphasis kept)
Oracle Database permits a serializable transaction to modify a row only if changes to the row made by other transactions were already committed when the serializable transaction began. The database generates an error when a serializable transaction tries to update or delete data changed by a different transaction that committed after the serializable transaction began: