Python ZeroMQ PUSH/PULL logic, set high water mark to a low end puller without losing any message

Question

I am using simple one to one PUSH/PULL worker/server python codes to send and receive messages.

The worker uses the PUSH socket to send messages to the PULL server. The server processing unit is not as strong as the worker, therefore when sending too many messages, the server's RAM starts to grow until the system kills everything.

I tried setting the receiver high water mark as follows:

worker_sock_in = ZMQ_CTXT.socket(zmq.PULL)
worker_sock_in.setsockopt(zmq.LINGER, 1000))
worker_sock_in.setsockopt(zmq.RCVTIMEO, 1000)) # detects if the link is broken
worker_sock_in.setsockopt(zmq.RCVHWM, 1000)
worker_sock_in_port = worker_sock_in.bind_to_random_port(listen_addr, port_start, port_end)

The code below used for the worker to create and send messages:

sock_dest = ZMQ_CTXT.socket(zmq.PUSH)
sock_dest.setsockopt(zmq.LINGER, 1000))
sock_dest.setsockopt(zmq.SNDTIMEO, 1000)) # detects if the link is broken
sock_dest.setsockopt(zmq.SNDHWM, 0) # never block on sending msg
sock_dest.connect(sock_dest_address)
# sends a msg
self.sock_dest.send(msg, zmq.NOBLOCK)

And it seems to correct the problem but my guess is overflow messages are just dropped by the server which is not acceptable in my situation.

I have based my development using this thread but I am not sure to understand the Additional Info part of the answer.

So questions are, what are the real behavior of HWM reached on noblock push/pull zeromq sockets and is there a way to have a push pull infrastructure that guaranties all sent messages will be received by the pull socket without inflating its memory or blocking the sender?

Here is a similar [question](https://stackoverflow.com/questions/48278859/how-to-have-limited-zmq-zeromq-pyzmq-queue-buffer-size-in-python/48461030#48461030) and here is an [answer](https://stackoverflow.com/a/53360536/3702377), but it doesn't guarantee all sent messages (with that manner there is miss sending). So, might be a message broker be a better choice in your case instead of a broker-less like ZMQ. — Benyamin Jafari, Dec 23 '19 at 17:28

user3666197 · Answer 1 · 2019-12-24T13:41:27.313

Q : is there a way to have a push pull infrastructure that guaranties all sent messages will be received by the pull socket without inflating its memory or blocking the sender?

A way? Yes, there is:

The built in ZeroWarranty ( covering a message being either delivered as a 1:1 bit copy of the original or not at all ) will need get extended - either by an application level protocol ( covering re-sends for those not delivered, until confirmed ) or moving your infrastructure into using but a specific guaranteed delivery protocol, that will help with this, above standard, requirement - user the norm:// transport-class extension and move the paradigm, in a case PUSH/PULL is still not in RTO-state into PUB/SUB, XPUB/XSUB Scalable Formal Communication Pattern Archetype.

_{A new transport option is available in libzmq. The "norm_engine.hpp" and "norm_engine.cpp" files provide an implementation of a NACK-Oriented Reliable Multicast (NORM) transport protocol option for ZeroMQ. NORM is an IETF open standards protocol specified in RFC 5740 and supporting documents. The Naval Research Laboratory (NRL) provides an open source reference implementation that is hosted at http://www.nrl.navy.mil/itd/ncs/products/norm.

NORM supports reliable data delivery over IP multicast but also supports unicast (point-to-point) data transfers. NORM operates on top of the User Datagram Protocol (UDP) and supports reliability via a NACK-based Automated Repeat Request (ARQ) that uses packet erasure coding for very efficient group communication. NORM also provides for automated TCP-friendly congestion control and mechanisms for support end-to-end flow control. The NRL NORM implementation can also be configured to provide basic UDP-like best effort transport service (with no receiver feedback) and this can be enhanced by adding some amount application-settable proactive forward error correction (FEC) packets to the transmission. I.e., by default NORM only sends 'reactive' FEC repair packets in response to NACKs but can also be configured to proactively send added repair packets for a level of reliability without any feedback from the receivers. In addition to its TCP-friendly congestion control, NORM can also be configured for fixed-rate operation and the NRL implementation supports some additional automated congestion control options suitable for use in bit error prone wireless communication environments. While its reliable ARQ operation is principally NACK-based (negative acknowledgement when packet loss is detected), it also supports optional positive acknowledgment (ACK) from receivers that can be used for delivery confirmation and explicit flow control.}

The Inflating memory requirement has two ways forward: one - an explicit control for the .send()-er, not to flood the .send()-er side Context()-instance's resources (RAM) i.e. within the restricted-resources limitations (principally preventing any flooding/discarded-messages to happen at all), the other - having sufficient-enough RAM and correctly configured Context()-instance, to let all the data flow through.

Q : what are the real behavior of HWM reached on noblock push/pull zeromq sockets?

First, lets demystify this. The ZMQ_NOBLOCK-directive points the local, .send()-side Context() to immediately return the call to .send()-method back to the caller, i.e. not blocking the calling code-execution ( the message-payloads are put for further processing inside the local ZeroMQ Context()-instance, irrespective of its internal state - a classical non-blocking code-design ).

The ZMQ_SNDHWM on the contrary instructs the Context()-instance, how this socket's thresholds are to be set and for the said PUSH/PULL .send()-er case:

_{he high water mark is a hard limit on the maximum number of outstanding messages ØMQ shall queue in memory for any single peer that the specified socket is communicating with. A value of zero means no limit.

If this limit has been reached the socket shall enter an exceptional state and depending on the socket type, ØMQ shall take appropriate action such as blocking or dropping sent messages. Refer to the individual socket descriptions in zmq_socket(3) for details on the exact action taken for each socket type.

ØMQ does not guarantee that the socket will accept as many as ZMQ_SNDHWM messages, and the actual limit may be as much as 60-70% lower depending on the flow of messages on the socket.}

Using also a ZMQ_XPUB_NODROP may help for the norm://-transport-class use-cases.

Be also warned, that by default, the ZMQ_PUSH-sockets' API confirms, that:

_{When a ZMQ_PUSH socket enters the mute state due to having reached the high water mark for all downstream nodes, or if there are no downstream nodes at all, then any zmq_send(3) operations on the socket shall block until the mute state ends or at least one downstream node becomes available for sending; messages are not discarded.}

For the underperforming suspect ( the PULL-side ), also test the properly sized settings on the O/S-side, using .getsockopt( ZMQ_RCVBUF )-method and adapting the sizing with proper, large enough .setsockopt( ZMQ_RCVBUF ), as needed :

_{The ZMQ_RCVBUF option shall set the underlying kernel receive buffer size for the socket to the specified size in bytes. A value of -1 means leave the OS default unchanged. For details refer to your operating system documentation for the SO_RCVBUF socket option.}

If nothing above did help, one may inject a self-diagnosing meta-plane into the ZeroMQ infrastructure, using zmq_socket_monitor services and gain a full control over the situations, that happen normally outside of sight of the application code ( reflecting internal API-states and transitions on an as-needed basis ).

The decision is yours.

It says that `messages are not discarded` when ZMQ_PUSH enters the mute state, is that the case when used with the NOBLOCK parameter? I will look into NORM which sounds highly promissing — aze, Dec 24 '19 at 15:51
The NOBLOCK directive is related to the way how the call is going to return to the .recv()-caller-side, not to the internal state of the Socket-endpoint ( best re-read the API documentation, about handling the indirect *errno*-variable for reflection the internal FSA-state ) — user3666197, Dec 29 '19 at 09:14

score 1 · Answer 2 · answered Dec 23 '19 at 13:50

I suggest you add a broker in the middle (between the sender and the receiver) which will save the sent messages for a given time. you will be obliged to make the code logic that will save the messages and be notified when the server does not receive a specific message. 0mq does not provide a way to save or to bring back a lost message.

Python ZeroMQ PUSH/PULL logic, set high water mark to a low end puller without losing any message

2 Answers2