0

I've 2 application in which service 1 consumes service 2 with service 2 replicated into 3 instances. Service 1 connects to service 2 using spring-cloud-zookeeper-discovery.

The application works as expected until I bring down one of the instance of service 2, the ZNode for the service instance gets deleted by 40s, I see negotiated session timeout is 40s, however I've created another instance service 3 which has same last negotiated session timeout to 40s, however the ZNode for the service 3 instance gets deleted within 1-2s.

Below is zookeeper server log after disconnect

2019-12-17 17:02:02,829 [myid:] - WARN  [NIOWorkerThread-5:NIOServerCnxn@370] - Exception causing close of session 0x10005eb831b0000: An existing connection was forcibly closed by the remote host
2019-12-17 17:02:02,830 [myid:] - DEBUG [NIOWorkerThread-5:NIOServerCnxn@373] - IOException stack trace
java.io.IOException: An existing connection was forcibly closed by the remote host
        at sun.nio.ch.SocketDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:321)
        at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:530)
        at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:155)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2019-12-17 17:02:02,833 [myid:] - DEBUG [NIOWorkerThread-5:MBeanRegistry@127] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port2181,name1=Connections,name2="0:0:0:0:0:0:0:1",name3=0x10005eb831b0000]
2019-12-17 17:02:02,834 [myid:] - DEBUG [NIOWorkerThread-5:NIOServerCnxn@627] - Closed socket connection for client /0:0:0:0:0:0:0:1:61987 which had sessionid 0x10005eb831b0000
2019-12-17 17:02:42,033 [myid:] - TRACE [SessionTracker:SessionTrackerImpl@213] - Session closing: 0x10005eb831b0000
2019-12-17 17:02:42,033 [myid:] - INFO  [SessionTracker:ZooKeeperServer@398] - Expiring session 0x10005eb831b0000, timeout of 40000ms exceeded
2019-12-17 17:02:42,036 [myid:] - TRACE [ProcessThread(sid:0 cport:2181)::ZooTrace@88] - :Psessionid:0x10005eb831b0000 type:closeSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2019-12-17 17:02:42,045 [myid:] - TRACE [ProcessThread(sid:0 cport:2181)::SessionTrackerImpl@213] - Session closing: 0x10005eb831b0000
2019-12-17 17:02:42,055 [myid:] - DEBUG [SyncThread:0:FinalRequestProcessor@99] - Processing request:: sessionid:0x10005eb831b0000 type:closeSession cxid:0x0 zxid:0x2d txntype:-11 reqpath:n/a
2019-12-17 17:02:42,058 [myid:] - TRACE [SyncThread:0:ZooTrace@88] - :Esessionid:0x10005eb831b0000 type:closeSession cxid:0x0 zxid:0x2d txntype:-11 reqpath:n/a
2019-12-17 17:02:42,060 [myid:] - DEBUG [SyncThread:0:DataTree@1052] - Deleting ephemeral node /services/kp-services/0696aa59-918a-4b7c-b80c-a26aaed55e2f for session 0x10005eb831b0000
2019-12-17 17:02:42,068 [myid:] - DEBUG [SyncThread:0:DataTree@1052] - Deleting ephemeral node /kp/leader/cluster/_c_6d664f66-5eae-450f-b9e4-a1d0f3b7c8c3-lock-0000000000 for session 0x10005eb831b0000
2019-12-17 17:02:42,071 [myid:] - DEBUG [SyncThread:0:SessionTrackerImpl@223] - Removing session 0x10005eb831b0000
2019-12-17 17:02:42,082 [myid:] - TRACE [SyncThread:0:ZooTrace@71] - SessionTrackerImpl --- Removing session 0x10005eb831b0000
2019-12-17 17:02:50,175 [myid:] - DEBUG [Session-HouseKeeper-1f36e637:HouseKeeper@264] - node0 scavenging sessions
2019-12-17 17:02:50,177 [myid:] - DEBUG [Session-HouseKeeper-1f36e637:SessionHandler@1344] - org.eclipse.jetty.server.session.SessionHandler1529306539==dftMaxIdleSec=-1 scavenging sessions
2019-12-17 17:02:50,186 [myid:] - DEBUG [Session-HouseKeeper-1f36e637:SessionHandler@1352] - org.eclipse.jetty.server.session.SessionHandler1529306539==dftMaxIdleSec=-1 scavenging session ids []
2019-12-17 17:02:50,194 [myid:] - DEBUG [Session-HouseKeeper-1f36e637:AbstractSessionCache@647] - org.eclipse.jetty.server.session.DefaultSessionCache@185d8b6[evict=-1,removeUnloadable=false,saveOnCreate=false,saveOnInactiveEvict=false] checking expiration on []
2019-12-17 17:02:50,659 [myid:] - INFO  [ContainerManagerTask:ContainerManager@119] - Attempting to delete candidate container: /services/kp-services
2019-12-17 17:02:50,659 [myid:] - TRACE [ProcessThread(sid:0 cport:2181)::ZooTrace@88] - :Psessionid:0x0 type:deleteContainer cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2019-12-17 17:02:50,666 [myid:] - INFO  [ContainerManagerTask:ContainerManager@119] - Attempting to delete candidate container: /kp/leader/cluster
2019-12-17 17:02:50,670 [myid:] - TRACE [ProcessThread(sid:0 cport:2181)::ZooTrace@88] - :Psessionid:0x0 type:deleteContainer cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2019-12-17 17:02:50,671 [myid:] - DEBUG [SyncThread:0:FinalRequestProcessor@99] - Processing request:: sessionid:0x0 type:deleteContainer cxid:0x0 zxid:0x2e txntype:20 reqpath:n/a
2019-12-17 17:02:50,682 [myid:] - TRACE [SyncThread:0:ZooTrace@88] - :Esessionid:0x0 type:deleteContainer cxid:0x0 zxid:0x2e txntype:20 reqpath:n/a
2019-12-17 17:02:50,686 [myid:] - DEBUG [SyncThread:0:FinalRequestProcessor@99] - Processing request:: sessionid:0x0 type:deleteContainer cxid:0x0 zxid:0x2f txntype:20 reqpath:n/a
2019-12-17 17:02:50,696 [myid:] - TRACE [SyncThread:0:ZooTrace@88] - :Esessionid:0x0 type:deleteContainer cxid:0x0 zxid:0x2f txntype:20 reqpath:n/a

Version Info
zookeeper: 3.5.5
spring-cloud-zookeeper-discovery: 2.0.1
Curator: 4.2.0

halfer
  • 19,824
  • 17
  • 99
  • 186
Karthik Prasad
  • 9,662
  • 10
  • 64
  • 112
  • There's not enough information here. Can you provide a test of some kind or logs? – Randgalt Dec 16 '19 at 23:37
  • @Randgalt let me know what kind of information? I too stuck in the same place, from where to begin debugging, from past week beating around bush, I was expecting there might some parameters which I need to reset in zookeper/sock connection. – Karthik Prasad Dec 17 '19 at 07:41
  • @Randgalt added zookeeper server log – Karthik Prasad Dec 17 '19 at 11:37
  • This is a server log and doesn't show much. The log of the client that's getting disconnected is needed. Also, you say the ZNode for service 3 is deleted. Does service 3 stay connected to ZooKeeper? If so, that doesn't make sense. Anyway, the log for service 3 might help. – Randgalt Dec 17 '19 at 13:35
  • @Randgalt I think I got the issue, when Network Partition happens it takes 40s to close the session, however when I gracefully close the application, the switch happens within 1-2s, Can you confirm this? – Karthik Prasad Dec 17 '19 at 15:09
  • By network partition I mean at between client and server and not server side(Zookeeper is CP though) or client failure. – Karthik Prasad Dec 17 '19 at 15:55
  • If you close the ZooKeeper/Curator object the session immediately ends and all ephemeral nodes for that session are deleted. – Randgalt Dec 17 '19 at 18:26

0 Answers0