Azure Event Hub ServiceBusException causing skipped messages

Question

We are using the Azure Java event hub library to read messages out of an event hub. Most of the time it works perfectly, but periodically we see exceptions of type "com.microsoft.azure.servicebus.ServiceBusException" occur that correspond to times when messages seem to be skipped that are in the event hub.

Here are some examples of exception details:

"The message container is being closed (some number here)."
- This generally hits multiple partitions at the same time, but not all.
- The callstack only includes com.microsoft.azure.servicebus and org.apache.qpid.proton.
"The link 'xxx' is force detached by the broker due to errors occurred in consumer(link#). Detach origin: InnerMessageReceiver was closed."
- This is generally tied to com.microsoft.azure.servicebus.amqp.AmqpException exceptions.
- The callstack only includes com.microsoft.azure.servicebus and org.apache.qpid.proton.

Example callstack:

at com.microsoft.azure.servicebus.ExceptionUtil.toException(ExceptionUtil.java:93)
at com.microsoft.azure.servicebus.MessageReceiver.onError(MessageReceiver.java:393)
at com.microsoft.azure.servicebus.MessageReceiver.onClose(MessageReceiver.java:646)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:83)
at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.onLinkRemoteClose(BaseLinkHandler.java:52)
at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:176)
at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309)
at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:276)
at com.microsoft.azure.servicebus.MessagingFactory$RunReactor.run(MessagingFactory.java:340)
at java.lang.Thread.run(Thread.java:745)

There doesn't seem to be a way for clients of the library to recognize a problem occurs and avoid moving ahead in the event hub past our skipped messages. Has anyone else run into this? Is there some other way to recognize and avoid skipping or retrying missed messages?

Are you directly using the PartitionReceiver (receive() method or ReceiveHandler ?) or are using EventProcessorHost? — Sreeram Garlapati, Mar 02 '17 at 20:29
I am facing the same issue in my application. We have 2 event hub consumers (for 2 partitions) for data and 2 more event hub consumers for operationmonitoring. Consumers for data are crashing randomly with the same above exception while the operationmonitoring consumers are working pretty fine. FYI, we are using PartitionReceiver directly. — nagamanojv, Mar 03 '17 at 13:35

Sreeram Garlapati · Accepted Answer · 2017-04-06T20:33:53.157

This error DOESN'T SKIP any messages - it will throw an Exception, when it shouldn't have. This will result in EPH to RESTART the affected Partitions' Receiver. If the application using EventHubs javaclient doesn't handle the errors - they may experience loss of messages.

This is a bug in our retry logic - in the current version of EventHubs JavaClient - until 0.11.0.

Here's the corresponding issue to track progress.

In EventHubs service - these errors happen if - for any reason - the Container hosting your EventHubs' code has to close (for the sake of the explanation, imagine we run a set of Container's - like DockerContainers for every EventHub namespace) - this is a transient error - this Container will eventually be opened in another Node.

Our javaclient-retry logic should have handled this error and should have retried - Will keep this thread posted with the fix.

EDIT

We just released 0.12.0 - which fixes this issue.

Thanks! Sreeram

Azure Event Hub ServiceBusException causing skipped messages

1 Answers1