Avoid JVM stop-the-world in critical section of code

Question

I'm trying to solve a problem where I want to ensure mutual exclusiveness among multiple processes running on different machines. I want to ensure that there is always one and only one process executing the critical section. To achieve that, I maintain a lock in a database which expires with time. It expires to avoid starvation.

Now, given that JVM can go into stop-the-world (STW) state anytime, it means, if a process is in STW long enough, another process can enter the critical section. I want to know -

If there is a way a thread can be notified/killed before JVM starts STW?

If there is a way I can configure JVM not do STW while a thread is in the critical section of the code?

If there is a way an application could specify the 'safe points' to JVM?

If there is a way I can configure JVM to crash instead of going into STW?

PS: I understand that I can have a insanely large timeout on the locks I'm creating based on the JVM configuration. It'll ensure that JVM STW will finish before the lock expiry. But that is not future-proof and is not the point of discussion. I'm also not looking for the ways to reduce the number of SWT or duration of STW as they do not have guarantees I'm looking.

There are techniques to reduce JVM pauses, but there is no way to disable them. STW pauses are required for normal JVM operation. Furthermore, not only JVM can pause your application, but the underlying operating system, too, unless you run some kind of real-time JVM on top of a real-time OS. — apangin, Jan 17 '18 at 01:18
@apangin Thanks for the comment. BTW I'm not looking to disable the STW. I'm looking for ways to- get notified or have STW invoke some actions on critical threads or have application specify a safe point to run STW. Thanks for mentioning OS pause. I understand OS can also cause a pause but I'm not looking for solutions for OS pause at this time. — rajneesh2k10, Jan 17 '18 at 01:31
Care to comment for the down vote? I would want to improve my question if it lacks detail or if it can be phrased better or may be just a learning for the next time. — rajneesh2k10, Jan 17 '18 at 01:34
Seems you would probably be better off with some other solution like Apache Zookeeper leader election. — yegodm, Jan 17 '18 at 09:57
@yegodm Thanks for the suggestion. I've evaluated similar solutions. But the solution fails if we assume a JVM can go in STW for unreasonable (may be not practical) time and can come back up. — rajneesh2k10, Jan 17 '18 at 16:36
Actually not quite true as leader election is based on tracking heartbeat of the connected nodes. So, once the current leader does not respond within the defined timeout (for example because of a long STW), the next node becomes the leader. — yegodm, Jan 17 '18 at 16:54
@yegodm Well, let me explain a little bit. Once a leader is elected and is supposed to perform a task, leader goes into STW. Another leader comes up because of no heartbeat and performs the task. First leader comes back up and before it could heartbeat or detect that it's not a leader, it performs the task which is already done. If the task is not idempotent, you see an inconsistency. Only way here is to tune the heartbeat and leader election interval. So, the solution is still dependent on how long a JVM pause can be. — rajneesh2k10, Jan 17 '18 at 18:49
Agree, yet solution heavily depends on the nature of the task. If, for example, it's a database transaction, optimistic locking can probably be used to prevent suddenly revived ex-leader from committing stale data. — yegodm, Jan 17 '18 at 19:02
@yegodm True. Optimistic locking is indeed in place. However, in a data store where records can be deleted, optimistic locking might not help in certain situations. e.g.- a process sees no record in the table and is ready to write a new item with optimistic concurrency control in place. It goes into STW, another process comes in writes new item followed by another process deleting the item. Process in the STW comes back and fires the write which will just pass. This might and might not be acceptable in certain situations. So, data store will need to provide a stronger primitive for locking. — rajneesh2k10, Jan 17 '18 at 20:51

score 2 · Answer 1 · answered Jan 17 '18 at 20:52

Specialized realtime JVMs exist, they usually have specialized no-allocation threads that don't get suspended on GCs. But those usually target embedded systems and would be a fairly blunt tool if your goal is just to avoid releasing a DB lock.

Alternatively Redhat offers OpenJDK builds with Shenandoah and Azul with C4 which are approximately pauseless collectors.

Or you could simply try tuning your JVM to get the pause times below the lock timeout.

To achieve that, I maintain a lock in a database which expires with time.

You should consider implementing lockfree read-copy-(atomic)update algorithms instead.

If the task is not idempotent, you see an inconsistency.

So making your tasks idempotent would be another option.

Generally your application design seems to be incorrect if its consistency guarantees rely on a lock that can be violated spontaneously.

Thanks for the response. These are valuable suggestions. In the worst case I'm looking for options to tune the JVM so that I've a bounded STW pause and then lock duration can be based off of that number. I understand why would you say "design seems incorrect". A solution which guarantees no lock violation requires stronger primitive (atomicity) which has much higher cost in terms of performance. So, I'm considering a tradeoff. — rajneesh2k10, Jan 17 '18 at 21:14

Avoid JVM stop-the-world in critical section of code

1 Answers1