5

I'm running a large Java EE 7 application on a two-node JBoss cluster which was recently upgraded from JBoss EAP 6 to JBoss EAP 7.0.4. Intermittently, the application runs into a problem where access becomes very slow, up to the point that the application becomes difficult to access. After a couple of minutes, the issue resolves itself automatically, and operation returns to normal.

Thread dump analysis shows that the incident is caused by the following behavior:

  1. A thread A tries to write to the session cache, but has to wait to acquire a lock to the cache entry. It is put in TIMED_WAITING state.
  2. A thread B tries to obtain the JSF Flash for the same session. In doing so, it tries to update the session, but because thread A is already locking the session, it is put in BLOCKED state.
  3. All threads C that try to obtain the JSF Flash (for any session) are put in BLOCKED state, thus making the application unresponsive.

Step 1 and 2 are caused by the fact that we're using a synchronous distributed Infinispan session cache, and don't really cause a problem as the locking is local to one user session. Step 3, however, is extremely problematic, because a blocked update to one user's session suddenly impacts all users.

The stack traces for all threads C are identical:

stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor) at com.sun.faces.context.flash.ELFlash.getFlash(ELFlash.java:318)

  • waiting to lock <0x00007ef80a75e260> (a io.undertow.servlet.spec.ServletContextImpl)

Now this was very surprising to me. Why would all requests trying to obtain the JSF flash be blocked by the ServletContext—which is expected to be global to the application? The cause can be found in the Mojarra ELFlash source code (line 318 contains the synchronized statement):

if (appMap.get(EnableDistributable.getQualifiedName()) != null) {
    synchronized (extContext.getContext()) {
        if (extContext.getSession(false) != null) {
            SessionHelper sessionHelper = SessionHelper.getInstance(extContext);
            if (sessionHelper == null) {
                sessionHelper = new SessionHelper();
            }
            sessionHelper.update(extContext, flash);
        }
    }
}

In a nutshell, what's happening here that at some point during the development of Mojarra 2.2.x, the developers decided that if a JSF Flash is operated in a distributed environment, the Flash state needs to be stored in the session (if present) so that it can be replicated across the cluster.

While I don't necessarily take issue with this, I have the following questions:

  1. What is the reasoning behind synchronizing on the global ServletContext? This seems overly broad and can potentially block a large number of requests.
  2. Is there any way of changing this behavior, specifically for the Flash? I've pored over the source code, but cannot seem to find a way other than removing <distributable/> from the web.xml, or potentially unsetting the com.sun.faces.enableDistributable context parameter. Both of these options have an impact that is too large for my use case.

Update

I've opened a GitHub issue (#4376) about this behavior on the Mojarra issue tracker.

Robby Cornelissen
  • 91,784
  • 22
  • 134
  • 156
  • If using sticky sessions, Infinispan session cache can be configured to be async instead of sync. In fact, IIRC, async is the default out-of-the-box setting. – Galder Zamarreño Jul 11 '18 at 15:00
  • 2
    @GalderZamarreño Yes, async is indeed the default setting, but that doesn't negate the fact that there are valid use cases for a synchronous session cache. Also, the option of having an asynchronous session cache does not justify blocking access to the cache on anything higher than the session level. – Robby Cornelissen Jul 12 '18 at 01:10

0 Answers0