3

I found that non-persistent messages are lost sometimes even though the my pulsar client is up and running. Those non-persistent messages are lost when the throughput is high (more than 1000 messages within a very short period of time. I personally think that this is not high). If I increase the parameter receiverQueueSize or change the message type to persistent message, the problem is gone.

I check the Pulsar source code (I am not sure this is the latest one)

https://github.com/apache/pulsar/blob/35f0e13fc3385b54e88ddd8e62e44146cf3b060d/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentDispatcherMultipleConsumers.java#L185

and I think that Pulsar simply ignore those non-persistent messages if no consumer is available to handle the newly arrived non-persistent messages. "No consumer" here means

  • no consumer subscribe the topic
  • OR all consumers are busy on processing messages received before

Is my understanding correct?

Jack Ng
  • 33
  • 2

1 Answers1

3

The Pulsar broker does not do any buffering of messages for the non-persistent topics, so if consumers are not connected or are connected but not keeping up with the producers, the messages are simply discarded.

This is done because any in-memory buffering would be anyway very limited and not sufficient to change any of the semantics.

Non-persistent topics are really designed for use cases where data loss is an acceptable situation (eg: sensors data which gets updates every 1sec and you just care about last value). For all the other cases, a persistent topic is the way to go.

Matteo Merli
  • 720
  • 3
  • 4
  • Thx for your confirmation. I used vertx for messaging before. Vert.x does its best to deliver messages and won’t consciously throw them away. I "thought" that vertx and pulsar have similar behavior for non-persistent message but I was wrong. :) – Jack Ng Jan 28 '22 at 00:16