1

There occurs a weird issue in our system.

e.g. we've got a cluster consisting of 2 nodes. Geode Locator is running on the master node an there's one remote node which runs a Geode client. When there occurs network problems in the client (packet loss) for some time client fails to connect to the Locator (NoAvailableLocatorException). The weird thing is that even after network comes to it's normal state client still fails to connect to the Locator with the same exception, even after restarting the client. After an investigation we found out that the Locator's port is stuck on SYN_RECV, so when we restart the Locator the issue seems to be gone. Can you provide any clue how can we solve this issue and why are the server ports stuck on SYN_RECV, as we don't want to restart the cluster or find out Locators and servers and restart each of them.

mdavid
  • 563
  • 6
  • 20

2 Answers2

0

I'm not sure what you mean by SYS_RECV. It's not a state that I'm familiar with. Do you mean SYN_RECV? A SYN_RECV state indicates that a connection request has been received and it's waiting for the followup. Are there a lot of these or only one and that one matching the port that the locator was configured to use? A thread dump of the locator would help to show what its up to.

The locator has one server socket for location-services processing. The server socket should be in a LISTEN state on the configured locator socket waiting for connection requests. These connections are handed off to a thread pool where the request data is read from the socket and the request is processed. These sockets should be in ESTABLISHED state. That same thread will send a response back on the same socket. After the client reads the response the connection to the locator is aborted in order to avoid having sockets in a TIME-WAIT state.

0

The only thing I can think of that could possibly help, as far as Geode is concerned, is to set a lower read-timeout on the locator. The default is 60000 milliseconds.

-Dgemfire.TcpServer.READ_TIMEOUT=10000

Aside from that there's little a Java-based server can do about dropped SYN ACKs. I assume you've searched the internet and found lots of pages talking about this problem.