1

I am running below beam pipeline which reads from a local Kafka INPUT_TOPIC and writes into another local Kafka OUTPUT_TOPIC. I created a publisher to feed INPUT_TOPIC (manually) and a consumer to check what I am getting on the OUTPUT_TOPIC but wondering whether it is a correct setup to test exactly-once semantics.

Relatively new to Beam and Kafka so looking for suggestion on how to test this pipeline in a better way and confirm that exactly-once semantics works in local environment.

Note: I have installed Apache Spark in my machine and running the pipeline with -Pspark-runner option.

Example Beam Pipeline

p.apply(KafkaIO.<Long, String>read()
.withBootstrapServers("localhost:9092")
.withTopic(INPUT_TOPIC)
.withKeyDeserializer(LongDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.withConsumerConfigUpdates(ImmutableMap.of(ConsumerConfig.GROUP_ID_CONFIG, "test.group"))
.withConsumerConfigUpdates(ImmutableMap.of(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false))
.withReadCommitted()
.commitOffsetsInFinalize()
.withoutMetadata())
.apply(Values.<String>create())
.apply(KafkaIO.<Void, String>write()
  .withBootstrapServers("localhost:9092")
  .withTopic(OUTPUT_TOPIC)c
  .withValueSerializer(StringSerializer.class)
  .withEOS(1, "eos-sink-group-id")
  .values()
);

p.run();

Thanks

0 Answers0