8

I have events of different types. For example, some data is telemetry data, some is error information etc.

I thought it would be a good idea to create several IEventProcessor implementations, one for each event type. So each implementation will handle the event differently. Like writing to file or to database.

What's the best way to route events to a specific EventProcessor?

  • Should I let an EventProcessor monitor a specific partitionkey and if so, how?
  • Should I use the constructor of the EventProcessorHost that lets me specify a consumergroupname? If so, how can I send to a specific consumergroup using the EventHubClient? I do not see an option to specify a consumergroup there.
  • Should I do none of the above and just check an incoming eventdata for a specific property and just ignore the ones that I am not interested in?

I must say that I find the relation between partitionkey and consumergroup (if there is any) badly documented.

I've used option 2 but so far each EventProcessor get messages from all the consumergroupnames, not just the one specified in the EventProcessorHost constructor.

Peter Bons
  • 26,826
  • 4
  • 50
  • 74

1 Answers1

8

Great Question!

Before answering - I wanted to re-iterate couple of principles we followed while building EventHubs.

  • We wanted Event Hubs to be a highly durable, high-throughput, event ingestion pipeline. The major differentiating factor for coming up with a new Service while we already had existing pub-sub services on Azure like Queues/Topics (similar to AWS SQS, Google Pub-sub) - is, to provide higher throughput variant (& of course, with low latency) . We were able to deliver on this goal - with the trade-off that - we don't perform any per-message computations - like executing a Filter etc. on the Service. When you need per-message semantics - like de-dup per message, acknowledge receive per message, in your case, filter based on a property per message - and the throughput requirements are low - Queue/Topic might be your best bet.

  • We also envisioned that, Senders (or publishers) are at a much higher scale and vary significantly based on scenario. So we introduced 3 Sending patterns (Send, Send with PartitionKey, Send directly to a Partition). So, while sending you will notice the notion of PartitionKey - which will in turn translate to a Particular partition (Consider PartitionKey as a Clue to EventHub Service to Calculate placement of all events with the Same PartitionKey to be on Same Partition). But, while consuming Events, there is no notion of PartitionKey directly exposed by EventHubs. There is no relation b/w ConsumerGroups and PartitionKey.

  • and Receivers are usually just the computation roles and are limited in number. So, we exposed 1 generic Receive (consume) pattern - Receive from a Partition. Now, while consuming events, there might be different types of Consumers based on different factors - for ex: the Speed of consumption (Real-time Vs Historical), or type of data - and hence - we exposed multiple consumer groups. Although you could create 20 CGs, one interesting limitation we have here is that - each thruput unit purchased can yield 1 MBPS in and 2 MBPS out - which if fully utilized on Send side will limit it to 2 CGs. So, If you are processing the exact Same stream and have different ways to handle each event but each of them takes equal amount of time to process - then, using the same ConsumerGroup makes more sense.

To answer your question: IT REALLY DEPENDS.

Here are few solutions:

  • Since, there is a mix of event types in your scenario - you will need to foresee/decide if you have any scenarios, where there is a need to read and Process all types of events by a single consumer/processor. One ex: we usually see is - using one ConsumerGroup you want a count of all errors and other consumer group would actually perform specific action per error Type. If, you don't need that - sending each EventType to different eventhubs and then, using 1 consumer group with the specific IEventProcessor - is an option.

  • If you have scenarios where there is a need to Send all events to the same EventHub, and if you know that processing speed of some of the eventTypes is(or need to be) very fast - you should consider using different consumergroup, with Each consumer group tied to a specific IEventProcessor implementation and it will ignore the other EventTypes. For ex: if the ErrorInfo events and Special events need attention at Real-time and if the telemetry data is okay to take a hit of 15 mins due to slow processing or high-peak load times - I would go for one ConsumerGroup and name it Real-time and tie it with IEventProcessor which handles 2 types - Error and Special. Create 2nd ConsumerGroup and tie it with an IEventProcessor which handles Telemetry events.

Sreeram Garlapati
  • 4,877
  • 17
  • 33
  • Thank you for the great answer! I have one more question. Say I decide to go with the 2nd solution, am I correct to state I should tie the IEventProcessor to the ConsumerGroup using the consumergroupname parameter of the EventProcessorHost constructor? How do I specify the consumergroup when sending as to make "it will ignore the other EventTypes" happen? – Peter Bons Nov 30 '15 at 21:35
  • 1
    Correct. ConsumerGroupName should be passed to EventProcessorHost constructor. You cannot specify the name of the ConsumerGroup while Sending to EventHubs. In the solution (2) - All ConsumerGroups will receive all events - and the IEventProcessor will "do nothing" for eventTypes it cannot handle. – Sreeram Garlapati Nov 30 '15 at 22:31
  • In the solution (2) - all ConsumerGroups will receive all events - and I meant that your IEventProcessor implementation should be a no-op ("do nothing") for eventTypes it cannot handle... – Sreeram Garlapati Nov 30 '15 at 22:51
  • Ah I seem to finally grasp the idea of that whole consumergroupname thing. The fact that the checkpoint is based on the specified consumergroup is the key. I was not sure what the purpose of consumergroups is for. – Peter Bons Dec 01 '15 at 13:56
  • 1
    http://stackoverflow.com/questions/29127648/should-event-hubs-be-split-on-message-type/29164614#29164614 discusses more of the trade offs in splitting events between eventhubs. – cacsar Dec 02 '15 at 02:12