0

I am using spark structured streaming to consume data from kafka topic and write the data into another kafka sink.

I want to store the offset twice - once when reading from the topic and stire the offset. Secondly- when writing the data onto output sink and write the offset, which is possible by giving checkpoint directory location ,

Is it possible to write the offset consumed during subscribing the topic.

1 Answers1

0

You can use a StreamingQueryListener. You can add the listener to your stream by

spark.streams.addListener(new StreamingQueryListener() {

  override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = { 

    // insert code here to log the offsets in addition to Spark's checkpoint

  }

  override def onQueryProgress(event: QueryProgressEvent): Unit = {}

  override def onQueryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = {}
})
Michael Heil
  • 16,250
  • 3
  • 42
  • 77