Q : is there a way to have a push pull infrastructure that guaranties all sent messages will be received by the pull socket without inflating its memory or blocking the sender?
A way? Yes, there is:
The built in ZeroWarranty ( covering a message being either delivered as a 1:1 bit copy of the original or not at all ) will need get extended - either by an application level protocol ( covering re-sends for those not delivered, until confirmed ) or moving your infrastructure into using but a specific guaranteed delivery protocol, that will help with this, above standard, requirement - user the norm://
transport-class extension and move the paradigm, in a case PUSH/PULL
is still not in RTO-state into PUB/SUB, XPUB/XSUB
Scalable Formal Communication Pattern Archetype.
A new transport option is available in libzmq
. The "norm_engine.hpp
" and "norm_engine.cpp
" files provide an implementation of a NACK-Oriented Reliable Multicast (NORM) transport protocol option for ZeroMQ. NORM is an IETF open standards protocol specified in RFC 5740 and supporting documents. The Naval Research Laboratory (NRL) provides an open source reference implementation that is hosted at http://www.nrl.navy.mil/itd/ncs/products/norm.
NORM supports reliable data delivery over IP multicast but also supports unicast (point-to-point) data transfers. NORM operates on top of the User Datagram Protocol (UDP) and supports reliability via a NACK-based Automated Repeat Request (ARQ) that uses packet erasure coding for very efficient group communication. NORM also provides for automated TCP-friendly congestion control and mechanisms for support end-to-end flow control. The NRL NORM implementation can also be configured to provide basic UDP-like best effort transport service (with no receiver feedback) and this can be enhanced by adding some amount application-settable proactive forward error correction (FEC) packets to the transmission. I.e., by default NORM only sends 'reactive' FEC repair packets in response to NACKs but can also be configured to proactively send added repair packets for a level of reliability without any feedback from the receivers. In addition to its TCP-friendly congestion control, NORM can also be configured for fixed-rate operation and the NRL implementation supports some additional automated congestion control options suitable for use in bit error prone wireless communication environments. While its reliable ARQ operation is principally NACK-based (negative acknowledgement when packet loss is detected), it also supports optional positive acknowledgment (ACK) from receivers that can be used for delivery confirmation and explicit flow control.
The Inflating memory requirement has two ways forward: one - an explicit control for the .send()
-er, not to flood the .send()
-er side Context()
-instance's resources (RAM) i.e. within the restricted-resources limitations (principally preventing any flooding/discarded-messages to happen at all), the other - having sufficient-enough RAM and correctly configured Context()
-instance, to let all the data flow through.
Q : what are the real behavior of HWM reached on noblock push/pull zeromq sockets?
First, lets demystify this. The ZMQ_NOBLOCK
-directive points the local, .send()
-side Context()
to immediately return the call to .send()
-method back to the caller, i.e. not blocking the calling code-execution ( the message-payloads are put for further processing inside the local ZeroMQ Context()
-instance, irrespective of its internal state - a classical non-blocking code-design ).
The ZMQ_SNDHWM
on the contrary instructs the Context()
-instance, how this socket's thresholds are to be set and for the said PUSH/PULL
.send()
-er case:
he high water mark is a hard limit on the maximum number of outstanding messages ØMQ shall queue in memory for any single peer that the specified socket is communicating with. A value of zero means no limit.
If this limit has been reached the socket shall enter an exceptional state and depending on the socket type, ØMQ shall take appropriate action such as blocking or dropping sent messages. Refer to the individual socket descriptions in zmq_socket(3)
for details on the exact action taken for each socket type.
ØMQ does not guarantee that the socket will accept as many as ZMQ_SNDHWM
messages, and the actual limit may be as much as 60-70% lower depending on the flow of messages on the socket.
Using also a ZMQ_XPUB_NODROP
may help for the norm://
-transport-class use-cases.
Be also warned, that by default, the ZMQ_PUSH
-sockets' API confirms, that:
When a ZMQ_PUSH
socket enters the mute state due to having reached the high water mark for all downstream nodes, or if there are no downstream nodes at all, then any zmq_send(3)
operations on the socket shall block until the mute state ends or at least one downstream node becomes available for sending; messages are not discarded.
For the underperforming suspect ( the PULL
-side ), also test the properly sized settings on the O/S-side, using .getsockopt( ZMQ_RCVBUF )
-method and adapting the sizing with proper, large enough .setsockopt( ZMQ_RCVBUF )
, as needed :
The ZMQ_RCVBUF
option shall set the underlying kernel receive buffer size for the socket to the specified size in bytes. A value of -1 means leave the OS default unchanged. For details refer to your operating system documentation for the SO_RCVBUF
socket option.
If nothing above did help, one may inject a self-diagnosing meta-plane into the ZeroMQ infrastructure, using zmq_socket_monitor
services and gain a full control over the situations, that happen normally outside of sight of the application code ( reflecting internal API-states and transitions on an as-needed basis ).
The decision is yours.