I'm running a large Java EE 7 application on a two-node JBoss cluster which was recently upgraded from JBoss EAP 6 to JBoss EAP 7.0.4. Intermittently, the application runs into a problem where access becomes very slow, up to the point that the application becomes difficult to access. After a couple of minutes, the issue resolves itself automatically, and operation returns to normal.
Thread dump analysis shows that the incident is caused by the following behavior:
- A thread A tries to write to the session cache, but has to wait to acquire a lock to the cache entry. It is put in TIMED_WAITING state.
- A thread B tries to obtain the JSF
Flash
for the same session. In doing so, it tries to update the session, but because thread A is already locking the session, it is put in BLOCKED state. - All threads C that try to obtain the JSF
Flash
(for any session) are put in BLOCKED state, thus making the application unresponsive.
Step 1 and 2 are caused by the fact that we're using a synchronous distributed Infinispan session cache, and don't really cause a problem as the locking is local to one user session. Step 3, however, is extremely problematic, because a blocked update to one user's session suddenly impacts all users.
The stack traces for all threads C are identical:
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor) at com.sun.faces.context.flash.ELFlash.getFlash(ELFlash.java:318)
- waiting to lock <0x00007ef80a75e260> (a io.undertow.servlet.spec.ServletContextImpl)
Now this was very surprising to me. Why would all requests trying to obtain the JSF flash be blocked by the ServletContext
—which is expected to be global to the application? The cause can be found in the Mojarra ELFlash
source code (line 318 contains the synchronized
statement):
if (appMap.get(EnableDistributable.getQualifiedName()) != null) {
synchronized (extContext.getContext()) {
if (extContext.getSession(false) != null) {
SessionHelper sessionHelper = SessionHelper.getInstance(extContext);
if (sessionHelper == null) {
sessionHelper = new SessionHelper();
}
sessionHelper.update(extContext, flash);
}
}
}
In a nutshell, what's happening here that at some point during the development of Mojarra 2.2.x, the developers decided that if a JSF Flash
is operated in a distributed environment, the Flash
state needs to be stored in the session (if present) so that it can be replicated across the cluster.
While I don't necessarily take issue with this, I have the following questions:
- What is the reasoning behind synchronizing on the global
ServletContext
? This seems overly broad and can potentially block a large number of requests. - Is there any way of changing this behavior, specifically for the
Flash
? I've pored over the source code, but cannot seem to find a way other than removing<distributable/>
from theweb.xml
, or potentially unsetting thecom.sun.faces.enableDistributable
context parameter. Both of these options have an impact that is too large for my use case.
Update
I've opened a GitHub issue (#4376) about this behavior on the Mojarra issue tracker.