Understanding Azure Event Hubs partitioned consumer pattern

Question

Azure Event Hub uses the partitioned consumer pattern described in the docs. I have some problems understanding the consumer side of this model when it comes to a real world scenario.

So lets say I have 1000 messages send to the event hub with 4 partitions, not defining any partition Id. This means the messages will go to all partitions using the round-robin method.

Now I want to have two applications distributing the messages to two different databases. My questions there are:

Lets say for the first application, I want to store all messages in Database 1. This means, for maximum speed, In my consumer application I need to have 4 threads (consumers), each listening to one partition of the event hub, right? Each of them also has to store their own offset for the partition they're reading (checkpoint).
Lets say my second application wants to filter the messages and only store a subset of them in Database 2. There I also need 4 consumers since I don't know which message goes to which partition, right?
Also for the two applications I need to have two consumer groups, but why? Is the filtering of the messages defined in the consumer group? I don't get it really why I need this one, since the applications consumers store the partition checkpoints by themselves and I can do the filtering within the applications itself.

I know there is the EventProcessorHost class but I want to understand the concept of the EventHub on a lower level.

Peter Bons · Accepted Answer · 2019-11-28T11:07:34.557

Lets say for the first application, I want to store all messages in Database 1. This means, for maximum speed, In my consumer application I need to have 4 threads (consumers), each listening to one partition of the event hub, right? Each of them also has to store their own offset for the partition they're reading (checkpoint).

Correct, you should have a process per provisioned partition. So, if you have 4 processors you should have 4 processes, each processing the messages of a specific partition. If you process the messages using an EventProcessorHost it will take care of the spinning up of the processes for you.

Lets say my second application wants to filter the messages and only store a subset of them in Database 2. There I also need 4 consumers since I don't know which message goes to which partition, right?

What do you mean with a consumer? You need another 4 processes to process the messages but they should be configured to read using a different consumer group. Otherwise they will compete with the processes of 1

Also for the two applications I need to have two consumer groups, but why? Is the filtering of the messages defined in the consumer group? I don't get it really why I need this one, since the applications consumers store the partition checkpoints by themselves and I can do the filtering within the applications itself.

Let us define a consumer group:

Consumer groups enable multiple consuming applications to each have a separate view of the incoming message stream, and to read the stream independently at its own pace with its own offset

So yes, you need 2 different consumer groups. Each consumer group will get all messages send to the event hub partitions. Each consumer group tracks its own progress in the stream of messages. That is why you need two for your scenario.

Say you define an additional consumer group called "App2-Consumer-Group", the reader processes will receive all messages but should take no action for messages they are not interested in.

If you would not create an additional consumer group, the reader processes for the default consumer group will process the messages for the first application and mark them as processed using the check-pointing mechanism. The reader processes for the second application won't get any messages since they are already marked as processed. (In real life, when using one consumer group with some messages might be picked up by the reader processes for the first application and some messages might be picked up by reader processes for the second application as the processes will try to get a lock on a specific partition)

I think this image shows clearly how consumer groups track their own progress in the stream of message and hence why you need tow of them if you have 2 different processing logic for the 2 different applications:

Thanks, much more clear now. My misunderstanding was that the processes just have to store an offset by themselves and don't "mark" the messages. Btw. can I configure the consumer groups somehow to not get all the messages and do some pre-filtering? — Tobias von Falkenhayn, Nov 28 '19 at 15:31
@TobiasvonFalkenhayn, yes, it's possible. You can do something as per this [answer](https://stackoverflow.com/questions/59014664/azure-event-hub-directing-messages-to-consumers?answertab=active#tab-top). — Ivan Glasenberg, Nov 29 '19 at 02:19

Understanding Azure Event Hubs partitioned consumer pattern

1 Answers1

Linked