I have the following setup: remote devices running active zabbix_agentd (version 2.0) using socat to tunnel through an HTTPS proxy.
On the server side: Apache with a proxy service allowing CONNECT to localhost:10051 (zabbix_proxy). The connection is encrypted with SSL, requiring valid client certificate.
On the client side: Socat beta8 command line:
socat -d -d -ly "TCP-LISTEN:10051,bind=127.0.0.1,reuseaddr,fork" "PROXY:127.0.0.1:10051,connect-timeout=30 | OPENSSL:<server_domain_name>:443,connect-timeout=30,cafile=<CA_CERT_FILE>,certificate=<CLIENT_CERT_FILE>"
zabbix_agentd is configured to work in active mode only and to connect to localhost:10051
Problem: on some machines (a small minority), some of the connections don't close properly and the socat child process hangs with the TCP socket in CLOSE_WAIT state. The socket in question has the local endpoint of 127.0.0.1:10051, so it seems like the zabbix_agentd is the culprit that doesn't close the socket correctly. The hanging socat processes consume a lot of CPU cycles and eventually crash the system. The only way to clear them is with a SIGKILL signal.
Any recommendations on dealing with this problem, besides periodically killing hanging processes?
Thanks.