I have a Beam pipeline to consume streaming events with multiple stages (PTransforms) to process them. See the following code,
pipeline.apply("Read Data from Stream", StreamReader.read())
.apply("Decode event and extract relevant fields", ParDo.of(new DecodeExtractFields()))
.apply("Deduplicate process", ParDo.of(new Deduplication()))
.apply("Conversion, Mapping and Persisting", ParDo.of(new DataTransformer()))
.apply("Build Kafka Message", ParDo.of(new PrepareMessage()))
.apply("Publish", ParDo.of(new PublishMessage()))
.apply("Commit offset", ParDo.of(new CommitOffset()));
The streaming events read by using the KafkaIO and the StreamReader.read()
method implementation is like this,
public static KafkaIO.Read<String, String> read() {
return KafkaIO.<String, String>read()
.withBootstrapServers(Constants.BOOTSTRAP_SERVER)
.withTopics(Constants.KAFKA_TOPICS)
.withConsumerConfigUpdates(Constants.CONSUMER_PROPERTIES)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class);
}
After we read a streamed event/message through the KafkaIO, we can commit the offset.
What i need to do is commit the offset manually, inside the last Commit offset
PTransform when all the previous PTransforms executed.
The reason is, I am doing some conversions, mappings and persisting in the middle of the pipeline and when all the things done without failing, I need to commit the offset. By doing so, if the processing fails in the middle, i can consume same record/event again and process.
My question is, how do I commit the offset manually? Appreciate if its possible to share resources/sample codes.