0

If I do multiple of the below - one for each topic:

KafkaSource<T> kafkaDataSource = KafkaSource<T>builder().setBootstrapServers(consumerProps.getProperty("bootstrap.servers")).setTopics(topic).setDeserializer(deserializer).setGroupId(identifier)
        .setProperties(consumerProps).build();

The deserializer seems to get into some issue and ends up reading data from different topic of different schema than it meant for and fails!

If I provide all topics in the same KafkaSource then watermarks seems to be progress across the topics together.


DataStream<T> dataSource = environment.fromSource(kafkaDataSource,
        WatermarkStrategy.<T>forBoundedOutOfOrderness(Duration.ofMillis(2000))
            .withTimestampAssigner((event, timestamp) -> {...}, ""));

Also the avro data in the kafka itself holds the first magic byte for schema (schema info is embedded), not using any external avro registry (it's all in the libraries).

It works fine with FlinkKafkaConsumer (created multiple instances of it).

FlinkKafkaConsumer<T> kafkaConsumer = new FlinkKafkaConsumer<>(topic, deserializer, consumerProps);
    kafkaConsumer.assignTimestampsAndWatermarks(WatermarkStrategy.<T>forBoundedOutOfOrderness(Duration.ofMillis(2000))
        .withTimestampAssigner((event, timestamp) -> {

Not sure if it's a problem the way that I am using? Any pointers on how to solve would be appreciated. Also FlinkKafkaConsumer is deprecated.

rajguru
  • 1
  • 1
  • Figured it based on the code in here https://stackoverflow.com/questions/72006018/custom-avro-message-deserialization-with-flink?rq=1. Implemented open method and the instance fields of the deserialisier are made transient. – rajguru Feb 02 '23 at 07:36

1 Answers1

0

Figured it based on the code in here Custom avro message deserialization with Flink. Implemented open method and the instance fields of the deserialisier are made transient.

rajguru
  • 1
  • 1