My use case requires to read messages from a Kafka topics and process the messages in the natural order as they were published into the Kafka.
The Kafka producer is responsible to publish each group of messages sorted in a single kafka topic-partition, and I need to process each group of message in the same Vertex-Processor in the same order.
The image above represents the basic idea. There a few KafkaSource-Processors reading from Kafka.
And one edge connected to a vertex to decode the kafka message and so on.
I could use the kafka message key as the partitioning key, but I think that I will end up with unbalanced decode processor.
Given that:
- How can I create a new Partitioner ? I couldn't find any example to inspire me.
- On the new Partitioner, how can I identify KS processor that emitted the message ? I would like to have a 1-to-1 relationship between previous vertex process and the next vertex processor, for instance, KS#0 always send the messages to the Decode#0, KS#1 to Decode#1 and so on.
- Do I need a new partitioner for that or is there some out-of-the-box functionality to achieve that ?