4

I have implemented a message bus in Linux for IPC using ZeroMQ (more specifically CZMQ). Here is what I have implemented.

My question is, how do I know that send dropped the message when the publisher buffer is full?

In my simple test setup, I am using a publisher-subscriber with a proxy. I have a fast sender and a very slow receiver causing messages to hit HWM and drop on send. My exception is that send would fail with 'message dropped' error, but it is not the case. the zmq_msg_send() is not giving me any error even though the messages get dropped (I can verify this by seeing gaps in messages in subscriber end).

How can I know when the messages get dropped? If this is the intended behaviour and ZeroMQ does not let us know that, what is a workaround to find if my send dropped the message?

Community
  • 1
  • 1
fortytwo
  • 491
  • 1
  • 5
  • 16

2 Answers2

1

By default zeromq pub/sub from recent versions defaults to a high-water mark ZMQ_SNDHWM/ZMQ_RCVHWM of 1000 messages.

What this means is if you burst in a tight loop more than 1000 messages it will prob drop some. It is simple to write a test and give each message a payload with a sequence number.

One option is to set both the HWMs to 0. This will mean it's infinite.

You can play about with this using some examples I wrote recently:

https://gist.github.com/easytiger/992b3a29eb5c8545d289 https://gist.github.com/easytiger/e382502badab49856357

The will pub and sub on a tport in a burst of messages. If you play with the HWM you can see in big bursts that if it isn't 0 it will drop a great many

Jonas Greitemann
  • 1,011
  • 10
  • 25
easytiger
  • 514
  • 5
  • 15
  • Thanks @easytiger. I looked into fixing slow subscriber problem by adding a sequence number as you suggested, specially [suicidal snail pattern](http://zguide.zeromq.org/page:all#Slow-Subscriber-Detection-Suicidal-Snail-Pattern) in the guide. But my use case is a bit different. I really don't care about the subscribers; I want to let my publisher know if he's dropping messages because he's hit the HWM. – fortytwo Sep 15 '14 at 23:19
  • 2
    Legitimate concern. Considering the op of PUB/SUB i think it should be defaulted to infinity and a callback mechanism put in place to notify you when soft watermarks are reach so you can create application logic to react to the situation. I guess it all depends on your consumer speed/parallelism/scalability and need for 100% reliability. Most applications using PUB/SUB don't need 100% reliability I imagine thus this design choice. Would it be hard I wonder to register a callback with the socket to invoke on a certain queue size? – easytiger Sep 16 '14 at 07:07
  • you mean to bypass ZeroMQ layer and to set/check queue sizes at Linux socket level? – fortytwo Sep 17 '14 at 02:30
  • 1
    Well i meant adding functionality to zeromq to allow you to get a callback when it makes a decision to drop anything. – easytiger Sep 17 '14 at 10:48
  • I assume it'd need some work so I am reluctant to touch ZeroMQ library; with the official 'it will make updates harder' reason (the real reason is plain laziness). – fortytwo Sep 18 '14 at 01:48
1

What you appear to be asking for is fault tolerance for which PUB/SUB isn't ideal. Not only may the HWM be reached, but consider what happens if a subscribing client dies and gets restarted - it will miss messages sent by the publisher for the duration. FWIW. In ZMQ v2, the default HWM was infinite for PUB/SUB, but got changed to 1000 in v3 because systems were choking for memory due to messages being queued faster than they could be sent. The 1000 seemed like a reasonable value for bursts of messages when the average message rate was within the network bandwidth. YMMV.

If you just want to know when messages get dropped, it's as simple as adding an incrementing message number to the message and having the subscribers monitor that. You could choose to place this number in it's own frame or not; overall simplicity will be the decider. I don't believe it's possible to determine when messages get dropped specifically because the HWM has been reached.

John Jefferies
  • 1,176
  • 7
  • 13
  • 2
    Thanks @john. As I mentioned in my reply above as well, I looked into adding a sequence number as you suggested, specially [suicidal snail pattern](http://zguide.zeromq.org/page:all#Slow-Subscriber-Detection-Suicidal-Snail-Pattern) in the guide. But my use case is a bit different. I really don't care about the subscribers; I want to let my publisher know if he's dropping messages because he's hit the HWM. I guess you are right; pub-sub might not be the best pattern for my requirements. I guess I will have to build reliability in my application on top of ZeroMQ transport. – fortytwo Sep 15 '14 at 23:21
  • @John > "The 1000 seemed like a reasonable value for bursts of messages when the average message rate was within the network bandwidth." That's definitely not true. The point of zeromq is so that you don't then have to go adding ringbuffers & 2 threads to your application. Even then I'm pretty sure I'm nowhere near saturating my 10GB nics with zeromq reads before i start dropping in pub/sub – easytiger Sep 16 '14 at 07:08