2

In my scenario, a process engine allows only one instance of a process execution to be run at the same time, refusing others. Since my application is planned to be deployed to clusters, it is important that decisions taken in a node reflect the global system status. I understand that there are plenties of ways to achieve my goal.

For that we are implementing a database lock leveraging standard ACID transactions. We have a table PROCESS_RUNS where we record current and historical executions. Better say: we use the history table as a shared lock mechanism, relying on the database to handle concurrency. Note: IDs are generated not by means of DB sequences

| ID | PROCESS_ID | STATUS  | START_TIME | END_TIME |
|----|------------|---------|------------|----------|
| 1  | PROCESS1   | RUNNING | --         | --       |
| 2  | PROCESS1   | OK      | --         | --       |
| 3  | PROCESS2   | RUNNING | --         | --       |
| 4  | PROCESS3   | KO      | --         | --       |

Only constraint here is that, given column PROCESS_ID, only one instance of each process may be in RUNNING status. The application handles all the logic (including node power failures) to keep the database consistent. In the example above, PROCESS1 and PROCESS2 are running, so a new instance of PROCESS3 or PROCESS4 can be inserted into running status, but the application must refuse to INSERT a new instance of PROCESS1.

Consider the following Hibernate code:

@Override
@Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.SERIALIZABLE)
public boolean checkNotRunningAndInsert(String processId, ProcessRun run)
{
    DetachedCriteria criteria = forClass(getEntityClass()).add(eq("processId", requireNonNull(processId, "processId is required")))//
                                                          .add(eq("status", ProcessRunStatus.RUNNING))//
                                                          .setProjection(rowCount());

    long activeCount = DataAccessUtils.longResult(getHibernateTemplate().findByCriteria(criteria));

    if (activeCount == 0)
    {
        // OK, return
        getHibernateTemplate().save(requireNonNull(run, "run is required"));
        return true;
    }
    else if (activeCount == 1)
    {
        // OK, process running
        return false;
    }
    else // if (activeCount>1)
    {
        log.error("Looks like more than 1 instance of the process {} is running. This is a bug", processId);
        return false;
    }
}

The above code does the following:

SELECT COUNT(*) FROM PROCESS_RUNS WHERE PROCESS_ID = ? AND STATUS = 'RUNNING'

Then if there are no rows it will issue

INSERT INTO PROCESS_RUNS (....,STATUS,....) VALUES (...,'RUNNING',...)

Otherwise it returns with a false. In case the DBMS refuses to commit the transaction, a PessimisticLockingFailureException is issued by Spring's @Transactional and handled in my business code by simply reissuing the transaction a few more times until it exits cleanly. The calling code is synchronized so when my application runs on a single server I have 100% guarantee that no two threads can access the method at the same time.

The same cannot be said for clustered installations. I have made myself a JUnit experiment proving that on Oracle (and perhaps Oracle only among Mysql, MS SQL and Postgres) the serialization constraint is sometimes not honoured when two threads (locking on different objects) perform the same transaction at once.

From a SERIALIZABLE isolation event I expect that if two transactions run at the same time either one of them is queued* or one of them is canceled, especially after the first did the SELECT. From textbook, given T1 and T2 the following must be guaranteed to be impossible

T1: SELECT
T2: SELECT
T2: INSERT
T1: INSERT
T1: COMMIT
T2: COMMIT

If I debug the above code and perform steps, thus manually guaranteeing the execution order of the above statements in a few seconds time, Oracle behaves as expected and I have one of the two transactions rolled back with explicit SQLException. If I retry the second transaction, I can see the first having committed and then the code returns false, as it is expected to run

T1: SELECT
T2: SELECT
T2: INSERT
T1: INSERT
T1: COMMIT
T2: ROLLBACK
T3: SELECT
T3: COMMIT EMPTY

Unfortunately, and here comes my question, I have seen that if I let JUnit run in the wild I get unpredictable results. Sometimes my test succeeds after returning false in one of the two threads, often no invocation returns false and I get two rows of same process in RUNNING status inserted. The result is not predictable, e.g. happens randomly approximately on a 50/50 odds. I can also see the database containing two rows of the process, more than having JUnit assert that only one thread returned with a false

The last violates my application constraints.

Here is a log of Show SQL output in wrong case

Hibernate: delete from PROCESS_RUNS
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

Correct case

Hibernate: delete from PROCESS_RUNS
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into PROCESS_RUNS (CONTEXT_CLASS, END_TIME, QUEUE_TIME, ESTIMATED_QUEUE_WAIT, EXCEPTION_CLASS, MODULE_ID, NODE_ID, ORGANIZATION_ID, PRIORITY, PROCESS_ID, REFERENCE_ID, START_TIME, STATUS, TRIGGER_ID, TRIGGER_TYPE, USER_ID, RUN_ID, END_CONTEXT, EXECUTION_EXCEPTION, START_CONTEXT, TRIGGER_CONTEXT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
14:47:25.441 [pool-2-thread-2] ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper - ORA-08177: impossibile serializzare l'accesso per questa transazione

14:47:25.451 [pool-2-thread-2] ERROR org.hibernate.internal.SessionImpl - HHH000346: Error during managed flush [could not execute statement]
Hibernate: select count(*) as y0_ from PROCESS_RUNS this_ where this_.PROCESS_ID=? and this_.STATUS=?

I have talked to my team and we discussed about row locking, but in a COUNT statement there is actually no row to lock.

I have tried to amend my code a few times:

  • Adding a 100ms sleep after the SELECT seems to give consistency to the wrong behaviour (i.e. I am always getting no false from threads and two rows in DBeaver).
  • Re-issuing the COUNT query after INSERT to check that there is exactly one row in RUNNING status adds nothing
  • Removing sleep keeps error consistent
  • Changing isolation level to READ_UNCOMMITTED requires altering the entire session, possibly with devastating effects
  • Changing isolation to READ_COMMITTED is not what I want in my case

I don't know what else to try. I need to fix this by either Java code or Hibernate configuration. I am also required that works with Oracle works also for MS SQL, Mysql and Postgres

Question is: how do I make sure that my code inserts a new row in a table if and only if no other row of the same PROCESS_ID is in status RUNNING, with regards to Oracle 12c and transactional isolation?

  • It has been repeatedly remarked that in reality no database serializes transactions, but rather grants that if both succeed the result of the operation is consistent with serial execution

Edit 1: perhaps Oracle is working as Oracle is designed. If you see the transaction from the SQL point of view, you see only two SELECTs and then two INSERTs running on different data. The outcome of a SELECT does not change the system state, none of the results of the SELECT are used in a VALUE of the INSERT. So the outcome of serial execution is two inserts. The real problem, which other databases correctly enforce, is that the result of SELECT is used by caller code to take a decision on whether or not to perform INSERT, and that is something Oracle DB will never be able to know in advance. I remember my DB2 classes (namely Data Bases 2nd, not IBM DB2) where a similar issue was discussed about implementation of DBMSes

The above is likely to be true according to research (emphasis kept)

Oracle Database permits a serializable transaction to modify a row only if changes to the row made by other transactions were already committed when the serializable transaction began. The database generates an error when a serializable transaction tries to update or delete data changed by a different transaction that committed after the serializable transaction began:

usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305
  • `given column PROCESS_ID, only one instance of each process may be in RUNNING status` - why don't you just create unique constraint on `(process_id, status='RUNNING')` ? In this way the database doesn't allow for two records with the same ID and status='RUNNING` and you don't need to serialize transaction and do not need to do such complicated and tricky synchronization taksk, just catch `DUPLICATE EXCEPTION` in your code. – krokodilko Dec 22 '17 at 13:59
  • *Just* is too simple. For design policy, structures have to be replicated to all four databases as they are, i.e. we don't adopt database-specific facilities for sake of testability and portability. Our **products** are installed at multiple locations. The problem of product vs project is that the project is a simple instance you can customize and optimize the way you like. Adopting DB-specific structures makes testing more complex and is currently forbidden here. I am looking for a different solution for now, sorry. – usr-local-ΕΨΗΕΛΩΝ Dec 22 '17 at 14:06
  • 1
    Instead of sleeping for 100 ms, how about sleeping for random x ms, and doing a check again. Might help with your chances for not getting a duplicate. Alternatively you can do a 2 step approach. Create a staging process table with primary key [all db have PK] process id. At any given time only 1 process id can be inserted in this table & other's will get a `DUPLICATE EXCEPTION`.In the step 2 phase if the record get's inserted in this staging table you move to the main txn table. Also you need to delete this record once you update that process to `STOPPED`/`OK` in the main txn table. – pOrinG Dec 22 '17 at 15:13
  • A better design approach would be to use the staging process table as main process table [with process id as PK] & keep the old one as history for the sake of process uniqueness. A table level PK guarantee's it. So once process finishes you move it to history & done. – pOrinG Dec 22 '17 at 15:33
  • I like the staging idea a lot. However, I am eager to **understand** why doesn't Oracle honour the serialization constraint, especially if on a random basis. Everyone here would be happy if Oracle threw error ORA-08177 consistently rather than changing the design of the process engine (to be redesigned in 2018 for other reasons) – usr-local-ΕΨΗΕΛΩΝ Dec 22 '17 at 15:41
  • Well, I think there is a gap in understanding what `Serializable` transaction does in oracle. Maybe @krokodilko can help us understand better here. – pOrinG Dec 22 '17 at 15:54
  • @pOrinG see my edit. I start to think that Oracle is designed to allow multiple inserts despite my idea of transaction – usr-local-ΕΨΗΕΛΩΝ Dec 22 '17 at 16:04
  • @usr-local-ΕΨΗΕΛΩΝ Aah. Thanks, I think that solves the doubt !! – pOrinG Dec 22 '17 at 16:07
  • 1
    @usr-local-ΕΨΗΕΛΩΝ yes, this is explained in the excerpt from documentation you have appended to the question: `The database generates an error when a serializable transaction tries to update or delete data changed by a different transaction that committed after the serializable transaction began` - ORA-08177: `Cannot serialize access for this transaction` is thrown when the serializable transaction tries to update or delete only, **but not INSERT**. So, Oracle allows multile inserts in the serializable isolation mode. – krokodilko Dec 22 '17 at 17:27

1 Answers1

2

It looks like this is simply how Oracle is designed. My scenatio is a WONTFIX/WORKSASEXPECTED case. Let me describe why.

While we agree that from a high-level point of view, the actual outcome is not equivalent to any sequential execution, simply Oracle is unaware of that.

In the comments, we already highlighted that Oracle guarantees isolation for updated and deleted data. Simply put...

  1. Oracle has nothing to lock if SELECT returns no result
  2. INSERT is not an update

I would argument that the index update would fall into the transactional protection case, but is currently out of scope.

I would also speculate that using a sequence would trigger the error, as the sequence is updated. But I haven't tried it. Most importantly, Hibernate caches sequence values.

Conclusion

Protection against concurrent execution is implemented by Oracle only during UPDATEs and DELETEs. If I am not happy with that I must change database to e.g. MS SQL Server which instead offers this kind of protection.

Community
  • 1
  • 1
usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305