1

We are using Spring Session management using Pivotal GemFire in our application.

In production, when the load increases the application is not responding (completely hangs). We are getting an error like the client is blacklisted. We checked the request count and it's like 15k.

The application is deployed in containers. The protocol used is Http11AprProtocol and the max thread count is set at 200. We checked the Thread Dump. Error is given below.

We are not sure whether the amount of load cannot be handled by the containers or by GemFire. In GemFire, is there any specific parameter which determines the number of Threads it can handle. Any help is appreciated.

Cache Client Updater Thread  on server Id=14397 in RUNNABLE (running in native)
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
- locked java.lang.Object@2f2e340
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked sun.security.ssl.AppInputStream@1ce48525
at org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:809)

]

John Blum
  • 7,381
  • 1
  • 20
  • 30
Sreejith
  • 33
  • 3

1 Answers1

2

GemFire should have no problem handling 15K requests per second/minute (??). Not sure what your measurement is in, but second/minute really should not matter. It may require some tuning, but GemFire should be able to handle it, whether minutes or seconds.

A few things to think about:

1) First, have a look here.

2) You can, of course, tune both sides of the client/server topology.

On the Client, you can use the PoolFactory to configure settings, things like min/max connections, prSingleHopEnable, socketBufferSize, threadLocalConnections, etc.

Using Spring Session for Pivotal GemFire, this the Pool used on the client (Web app, GemFire ClientCache) is configurable using either the SDG ClientCacheFactoryBean class if using the "DEFAULT" GemFire Pool, which you often declare yourself, like so, or the PoolFactoryBean class if you are using a specific, "named" Pool with Spring Session for Pivotal GemFire, in which case it would look something like this...

@SpringBootApplication
@EnableGemFireHttpSession(poolName = "SessionPool", ...)
class MySpringSessionGemFireClientApplication {


  @Bean("SessionPool")
  PoolFactoryBean sessionPool() {

    PoolFactoryBean sessionPool = new PoolFactoryBean();

    sessionPool.setMaxConnections(..);
    sessionPool.set...

    return sessionPool;
  }
}

On the Server, it really depends on how you started the nodes (e.g. Gfsh, using Spring, etc). But essentially, it boils down to settings on the CacheServer. For example: loadPoolInterval, maxConnections, socketBufferSize, maxThreads, etc.

3) I would also say you need to collect the information first to determine where the problem might be, looking at server logs, statistics, etc, etc. That information should be recommended in #1 above.

4) There are other factors to think about, e.g. size of data.

5) There are things you must consider from a Network standpoint, and adding "containers" adds a whole other layer of complexity, so it will be UC, architecture, infrastructure dependent.

Anyway, all of this is to say, it is difficult to say for certain what the problem is given all the factors (e.g. topology, architecture, data size, configuration, app design, etc, etc). Providing logs, stats, etc may shed some light.

Not sure why you think the Thread dump above is an "error". Yes the "Cache Client Updater Thread" is holding an Object Lock, however, the Thread also remains RUNNABLE (in service). The fact that that Thread is holding a lock is only a problem if another Thread (1 or more) is WAITING or BLOCKED, waiting for that lock, and it is starting to consume resources, or blocking/degrading certain application workflows.

I suspect you have some issue between GemFire and the container, but I cannot say that for certain.

John Blum
  • 7,381
  • 1
  • 20
  • 30