3

If we specify the starting position in EventHub conf like so:

EventHubsConf(ConnectionStringBuilder(eventHubConnectionString).build)
  .setStartingPosition(EventPosition.fromStartOfStream)
or
  .setStartingPosition(EventPosition.fromEndOfStream)

And also sepecify the checkpoint location in the StreamWriter

streamingInputDF
  .writeStream
  .option("checkpointLocation", checkpointLocation)
  ...

After a restart, does the setStartingPosition become irrelevant because the checkpoint is always used as the point from where to begin reading?

Thanks.

Gadam
  • 2,674
  • 8
  • 37
  • 56

1 Answers1

0

The information on offsets stored in the checkpoint files will be used when restarting the streamimg query.

Interestingly, this is not specifically mentioned in the structured streaming eventhubs integration guide, however, in the DStreams guide it is:

"The connector fully integrates with the Structured Streaming checkpointing mechanism. You can recover the progress and state of you query on failures by setting a checkpoint location in your query. This checkpoint location has to be a path in an HDFS compatible file system, and can be set as an option in the DataStreamWriter when starting a query."

Make sure to follow the general guidance on checkpoint recovery.

Michael Heil
  • 16,250
  • 3
  • 42
  • 77
  • Thanks can you link to the DStreams guide where it says that pls. I am curious about the 'fromEndOfStream' scenario where the checkpoint might point to older offset location that is before the End of Stream. It seems counterintuitive that it reads from an older location than the end of stream in this case. – Gadam Feb 20 '21 at 04:00
  • I have edited my question, provided the link and quoted the relevant part about checkpointing. – Michael Heil Feb 20 '21 at 07:53