0

For Structured streaming watermark is 1 hr set in api.

now I am using this api below in Streaming Listener:

**event: StreamingQueryListener.QueryProgressEvent**
triggerTime = Instant.parse(event.progress.timestamp)
watermarkTime = Instant.parse(event.progress.eventTime.getOrDefault("watermark", ""))

Which from Spark documentation is described below , but as I am using this api in SparkListener , when Listener is getting executed spark is reaching in another batch

so triggerTime.getEpochSecond - watermarkTime.getEpochSecond -3600

the above code in Listener is giving -ve answer which means watermark from eventTime map is showing the latest event not the event in which trigger happened as Listener gets late due to ListenerBus Queue Size.

logically it should be 0 as first term triggerTime.getEpochSecond - watermarkTime.getEpochSecond should be 3600 for each trigger time and then I am reducing content from it

but due to eventTime map giving latest watermark but trigger time is old its producing less than 3600 value

When there is really some huge processing delay it goes up +ve but in normal time with no delay it remains negative

I want to know is my understanding correct that its happening due to delay of Stream Listener running as there is no other value which will make this equation -ve.

val eventTime: ju.Map[String, String]
vipin
  • 152
  • 12

0 Answers0