I have a server which uses a ZMQ_ROUTER
to communicate with ZMQ_DEALER
clients. I set the ZMQ_HEARTBEAT_IVL
and ZMQ_HEARTBEAT_TTL
options on the client socket to make the client and server ping pong each other. Beside, because of the ZMQ_HEARTBEAT_TTL
option, the server will timeout the connection if it does not receive any pings from the client in a time period, according to zmq man page:
The ZMQ_HEARTBEAT_TTL option shall set the timeout on the remote peer for ZMTP heartbeats. If this option is greater than 0, the remote side shall time out the connection if it does not receive any more traffic within the TTL period. This option does not have any effect if ZMQ_HEARTBEAT_IVL is not set or is 0. Internally, this value is rounded down to the nearest decisecond, any value less than 100 will have no effect.
Therefore, what I expect the server to behave is that, when it does not receive any traffic from a client in a time period, it will close the connection to that client and discard all the messages in the outgoing queue after the linger time expires. I create a toy example to check if my hypothesis is correct and it turns out that it is not. The chain of events is as followed:
- The server sends a bunch of data to the client.
- The client receives and processes the data, which is slow.
- All send commands return successfully.
- While the client is still receiving the data, I unplug the internet cable.
- After a few seconds (set by the
ZMQ_HEARTBEAT_TTL
option), the server starts sending FIN signals to the client, which are not being ACKed back. - The outgoing messages are not discarded (I check the memory consumption) even after a while. They are discarded only if I call
zmq_close
on the router socket.
So my question is, is this suppose to be how one should use the ZMQ heartbeat mechanism? If it is not then is there any solution for what I want to achieve? I figure that I can do heartbeat myself instead of using ZMQ's built in. However, even if I do, it seems that ZMQ does not provide a way to close a connection between a ZMQ_ROUTER and a ZMQ_DEALER, although that another version of ZMQ_ROUTER - ZMQ_STREAM provides a way to do this by sending an identity frame followed by an empty frame.
The toy example is below, any help would be thankful.
Server's side:
#include <zmq.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv)
{
void *context = zmq_ctx_new();
void *router = zmq_socket(context, ZMQ_ROUTER);
int router_mandatory = 1;
zmq_setsockopt(router, ZMQ_ROUTER_MANDATORY, &router_mandatory, sizeof(router_mandatory));
int hwm = 0;
zmq_setsockopt(router, ZMQ_SNDHWM, &hwm, sizeof(hwm));
int linger = 3000;
zmq_setsockopt(router, ZMQ_LINGER, &linger, sizeof(linger));
char bind_addr[1024];
sprintf(bind_addr, "tcp://%s:%s", argv[1], argv[2]);
if (zmq_bind(router, bind_addr) == -1) {
perror("ERROR");
exit(1);
}
// Receive client identity (only 1)
zmq_msg_t identity;
zmq_msg_init(&identity);
zmq_msg_recv(&identity, router, 0);
zmq_msg_t dump;
zmq_msg_init(&dump);
zmq_msg_recv(&dump, router, 0);
printf("%s\n", (char *) zmq_msg_data(&dump)); // hello
zmq_msg_close(&dump);
char buff[1 << 16];
for (int i = 0; i < 50000; ++i) {
if (zmq_send(router, zmq_msg_data(&identity),
zmq_msg_size(&identity),
ZMQ_SNDMORE) == -1) {
perror("ERROR");
exit(1);
}
if (zmq_send(router, buff, 1 << 16, 0) == -1) {
perror("ERROR");
exit(1);
}
}
printf("OK IM DONE SENDING\n");
// All send commands have returned successfully
// While the client is still receiving data, I unplug the intenet cable on the client machine
// After a while, the server starts sending FIN signals
printf("SLEEP before closing\n"); // At this point, the messages are not discarded (memory usage is high).
getchar();
zmq_close(router);
zmq_ctx_destroy(context);
}
Client's side:
#include <zmq.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
void *context = zmq_ctx_new();
void *dealer = zmq_socket(context, ZMQ_DEALER);
int heartbeat_ivl = 3000;
int heartbeat_timeout = 6000;
zmq_setsockopt(dealer, ZMQ_HEARTBEAT_IVL, &heartbeat_ivl, sizeof(heartbeat_ivl));
zmq_setsockopt(dealer, ZMQ_HEARTBEAT_TIMEOUT, &heartbeat_timeout, sizeof(heartbeat_timeout));
zmq_setsockopt(dealer, ZMQ_HEARTBEAT_TTL, &heartbeat_timeout, sizeof(heartbeat_timeout));
int hwm = 0;
zmq_setsockopt(dealer, ZMQ_RCVHWM, &hwm, sizeof(hwm));
char connect_addr[1024];
sprintf(connect_addr, "tcp://%s:%s", argv[1], argv[2]);
zmq_connect(dealer, connect_addr);
zmq_send(dealer, "hello", 6, 0);
size_t size = 0;
int i = 0;
while (size < (1ll << 16) * 50000) {
zmq_msg_t msg;
zmq_msg_init(&msg);
if (zmq_msg_recv(&msg, dealer, 0) == -1) {
perror("ERROR");
exit(1);
}
size += zmq_msg_size(&msg);
printf("i = %d, size = %ld, total = %ld\n", i, zmq_msg_size(&msg), size); // This causes the cliet to be slow
// Somewhere in this loop I unplug the internet cable.
// The client starts sending FIN signals as well as trying to reconnect. The recv command hangs forever.
zmq_msg_close(&msg);
++i;
}
zmq_close(dealer);
zmq_ctx_destroy(context);
}
PS: I know that setting the highwater mark to unlimited is bad practice, however I figure that the problem will be the same even if the highwater mark is low so let's ignore it for now.