create new SideOut and add sink to it at runtime

Question

I have Flink DataStream job that flows like this:

before DataStream definition
- define list of Kafka Topics
- create Hashmap of string KafkaTopic -> SideOutput
- define Kafka Source
during DataStream definition
- add Kafka Source to env using fromSource
- define Process function that pulls the kafka topic from the message metadata, looks up the SideOutput from the Hashmap of (string KafkaTopic -> SideOutput) and outputs the message to the output tag returned from the Hashmap
after DataStream definition
- iterate over Hashmap of string KafkaTopic -> SideOutput
- create JDBC sink with unique insert statement for each sideoutput
- add a sink to each side OutputTag

I'd like to avoid building the list of KafkaTopics before defining the stream. I want to use the regex function of the KafkaSource to consume from all topics that match the pattern.

Is it possible to create new side Output tags and Sinks at runtime during the process function? If I encounter a new kafka topic create a sideoutput, add it to the stream, then add a new sink to the sideoutput?

The more I think about it, I assume this is not possible.

My alternative plan is to use a kafka connector and build a list of kafka topics in the 'before DataStream definition' step from above. In that case I would have to restart the job to consume from new topics.

I could have thousands of topics, which is why I want to dynamically define them.

maybe you can always create a side output and put a custom filter or flatMap function to this stream. And filter the results before processing them. — Kenank, Dec 08 '22 at 09:08
One issue with this approach is that the jdbc requires the prepared sqlstatement to be defined when the sink is created, you can't change the sqlstatement based the event flowing to the sink. That is my understanding. In my case I need each sink to be an insert statement to a different table. — OpenCoderX, Dec 08 '22 at 17:09

create new SideOut and add sink to it at runtime

0 Answers0