We are using the Apache Zookeeper Client C bindings in our application. Client library version is 3.5.1. When the Zookeeper connection gets disconnected, the application is configured to exit with error code 116.
Systemd is being used to automate starting/stopping the application. The unit file does not override the default setting for KillMode
, which is to send SIGTERM to the application.
When the process is stopped using the systemctl stop directive, the Zookeeper client threads seem to be attempting to reconnect to Zookeeper:
2016-04-12 22:34:45,799:4506(0xf14f7b40):ZOO_ERROR@handle_socket_error_msg@2363: Socket [128.0.0.4:61758] zk retcode=-4, errno=112(Host is down): failed while receiving a server response
2016-04-12 22:34:45,799:4506(0xf14f7b40):ZOO_INFO@check_events@2345: initiated connection to server [128.0.0.4:61758]
Apr 12 22:34:45 main thread: zookeeperWatcher: event type ZOO_SESSION_EVENT state ZOO_CONNECTING_STATE path
2016-04-12 22:34:45,801:4506(0xf14f7b40):ZOO_INFO@check_events@2397: session establishment complete on server [128.0.0.4:61758], sessionId=0x40000015b8d0077, negotiated timeout=20000
2016-04-12 22:34:46,476:4506(0xf14f7b40):ZOO_WARN@zookeeper_interest@2191: Delaying connection after exhaustively trying all servers [128.0.0.4:61758]
2016-04-12 22:34:46,810:4506(0xf14f7b40):ZOO_INFO@check_events@2345: initiated connection to server [128.0.0.4:61758]
2016-04-12 22:34:46,811:4506(0xf14f7b40):ZOO_ERROR@handle_socket_error_msg@2382: Socket [128.0.0.4:61758] zk retcode=-112, errno=116(Stale file handle): sessionId=0x40000015b8d0077 h
Due to this, the process is exiting with an error code. Systemd sees failure code upon exit and does not attempt to restart the application. Does anyone know why the client is getting disconnected?
I am aware that I can work around this by setting SuccessExitStatus=116
in the unit file, but I don't want to mask out genuine errors. I have tried registering a signal handler for SIGTERM and closing the Zookeeper client in the handler. But the handler code never seems to get hit when I issue systemctl stop.
EDIT: The handler wasn't getting called because I had made it asynchronous - it didn't execute immediately upon receiving signal. OTOH the process exits immediately upon Zookeeper disconnect.