1

We have made a retail solution in which session handling is taken care by Spring session .We use gemfire to maintain the session objects and use the client as spring session data gemfire client module. We had initial hookups and it was eventually up and running. But when the load to the containers in which the application increased ,we are facing serious issues in the way application responds.Response time increases too an extent that no requests are processed.

We analysed the thread dumps and could see many threads related to gemfire client are on blocked or waiting state. The jvm parameters,CPU usage and heap memory all seems to be fine even with the load that causes the issue.

We see the below from thread dump analysis :

Thread Contention Servlet - dispatcher:render Blocked on org.apache.geode.cache.client.internal.ConnectionImpl@3afbbf9

From gemfire client logs the below are observed:

4/30/19 12:03:21.559 PM [m[30m2019-04-30 12:03:21,559 [Cache Client Updater Thread on XX.XX.XX.XX(XXXXX:62475):1024 port 40404] INFO : Redundant subscription endpoint XXXXX:40404 crashed. Scheduling recovery. The first blacklisting log occurs as : 4/30/19 12:03:21.631 PM [m[34m2019-04-30 12:03:21,630 [queueTimer-DEFAULT] WARN : Cache Client Updater Thread on XX.XX.XX.XX(XXXXX:76221):1024 port 40404 (XXXXX:40404): Caught following exception while attempting to create a server-to-client communication socket and will exit: org.apache.geode.cache.client.ServerRefusedConnectionException: :40404 refused connection: java.lang.Exception: This client is blacklisted by server

After a blacklisting,that app instance becomes dead in all means.it wont be able to take n process any request.

Any help is much appreciated in terms of this blacklisting.

Osama AbuSitta
  • 3,918
  • 4
  • 35
  • 51

1 Answers1

0

The blacklist or the newer term of denylist means that client was to slow to respond to events.

You can read more about how to manage and prevent slow receivers in the docs: https://gemfire.docs.pivotal.io/98/geode/managing/monitor_tune/slow_receivers.html

The main take away from those docs is making sure the clients have enough resources and are not being starved of CPU cycles, network, disk or RAM.

If the clients are running in a virtualized environment take a look at steal time or ready time from vshpere perspective for the VM running the client. If the client is running in a container also make sure it's not being throttled because it went over quota.

Code for denylist: https://github.com/apache/geode/search?q=denylist&unscoped_q=denylist