I'm working with Apache Kafka and I've been experimenting with the Kafka Streams functionality. What I'm trying to achieve is very simple, at least in words and it can be achieved easily with the regular plain Consumer/Producer approach:
- Read a from a dynamic list of topics
- Do some processing on the message
- Push the message to another topic which name is computed based on the message content
Initially I thought I could create a custom Sink or inject some kind of endpoint resolver in order to programmatically define the topic name for each single message, although ultimately couldn't find any way to do that. So I dug into the code and found the ProducerInterceptor class that is (quoting from the JavaDoc):
A plugin interface that allows you to intercept (and possibly mutate) the records received by the producer before they are published to the Kafka cluster.
And it's onSend method:
This is called from KafkaProducer.send(ProducerRecord) and KafkaProducer.send(ProducerRecord, Callback) methods, before key and value get serialized and partition is assigned (if partition is not specified in ProducerRecord).
It seemed like the perfect solution for me as I can effectively return a new ProducerRecord with the topic name I want. Although apparently there's a bug (I've opened an issue on their JIRA: KAFKA-4691) and that method is called when the key and value have already been serialized. Bummer as I don't think doing an additional deserialization at this point is acceptable.
My question to you more experienced and knowledgeable users would be your input and ideas and any kind of suggestions on how would be an efficient and elegant way of achieving it.
Thanks in advance for your help/comments/suggestions/ideas.
Below are some code snippets of what I've tried:
public static void main(String[] args) throws Exception {
StreamsConfig streamingConfig = new StreamsConfig(getProperties());
StringDeserializer stringDeserializer = new StringDeserializer();
StringSerializer stringSerializer = new StringSerializer();
MyObjectSerializer myObjectSerializer = new MyObjectSerializer();
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.addSource("SOURCE", stringDeserializer, myObjectSerializer, Pattern.compile("input-.*"));
.addProcessor("PROCESS", MyCustomProcessor::new, "SOURCE");
System.out.println("Starting PurchaseProcessor Example");
KafkaStreams streaming = new KafkaStreams(topologyBuilder, streamingConfig);
streaming.start();
System.out.println("Now started PurchaseProcessor Example");
}
private static Properties getProperties() {
Properties props = new Properties();
.....
.....
props.put(StreamsConfig.producerPrefix(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG), "com.test.kafka.streams.OutputTopicRouterInterceptor");
return props;
}
OutputTopicRouterInterceptor onSend implementation:
@Override
public ProducerRecord<String, MyObject> onSend(ProducerRecord<String, MyObject> record) {
MyObject obj = record.value();
String topic = computeTopicName(obj);
ProducerRecord<String, MyObject> newRecord = new ProducerRecord<String, MyObject>(topic, record.partition(), record.timestamp(), record.key(), obj);
return newRecord;
}