Using provided topics for changelog and repartition while aggregating in Kafka Stream Processing

Question

I am using Kafka stream processing for aggregating data from source object with Springboot.

@Bean
public java.util.function.Consumer<KStream<String, SourceObject>> processSourceObject() {
    Serde<SourceObject> SourceObjectSerde = new JsonSerde<>(SourceObject.class);
    Serde<AgrregatedObject> AgrregatedObjectSerde = new JsonSerde<>(AgrregatedObject.class);
    return input -> input.map((key, value) -> new KeyValue<String, SourceObject>(value.uniques(), value))
            .groupByKey(Grouped.with(Serdes.String(), SourceObjectSerde))
            .aggregate(AgrregatedObject::new, (uniques, sourceObject,
                    destinationList) -> new SourceObjectUpdater().apply(sourceObject, destinationList),
                    Materialized.<String, AgrregatedObject>as(Stores.inMemoryKeyValueStore("custome-snapshots")).withKeySerde(Serdes.String()).withValueSerde(AgrregatedObjectSerde))
            .toStream().foreach((foo, bar) -> process);
}

While running this application, along with provided topic to processSourceObject it is auto-creating two more topics

processSourceObject-applicationId-data-snapshots-changelog

processSourceObject-applicationId-data-snapshots-repartition

I want to use existing topics instead of using these two topics for some reasons. Where do I make changes to provide names of predefined topics to use for changelog and repartition data by my application?

`Changelog` and `repartition` are internal Kafka topics to enable fault-tolerant stateful stream processing capabilities with Kafka Streams. `Changelog` topics are created when there are join/aggregation operations on the stream | `Re-partition` topics are created when there are key modifying operations on the stream. — Amit kumar, Jun 26 '20 at 06:09
Yeah these are internal topics. Is there any way to specify custom names to these topics? — ADCDER, Jun 26 '20 at 06:52
No you can't rename topics in kafka. Also, Repartition topic holds our transformed messages whereas changelog topic keeps track of the updates made to the state store i.e operation made on stream. They are created under the hood and have segregated task to perform. Why do yo need to use a different topic, when kafka is providing you one ? — Amit kumar, Jun 26 '20 at 07:29
Wouldn't even it possible to change first two parts of these topics? Something like `1. myappid-snapshots-changelog` `2. myappid-snapshots-repartition`. It's due to some production environment policies and restrictions — ADCDER, Jun 26 '20 at 09:48
Internal topics follow the naming convention as : -- | Suffix value can be either changelog/repartition. | And I don't think you can rename/change a kafka topic. — Amit kumar, Jun 26 '20 at 09:53

score 0 · Answer 1 · answered Jul 04 '20 at 20:38

It depends on the version you are using. As of Apache Kafka 2.4, the Streams API allows to name all operators/processors and those names are used for repartition and changelog topics.

However, all internal topics are always prefixed with <application.id>- and suffixed with -repartition or -changelog -- so you can only set part of the topic names.

For example, you can use Grouped.as("myName") to set a name for the repartition topic.

Using provided topics for changelog and repartition while aggregating in Kafka Stream Processing

1 Answers1