0

Is it possible to clear the current watermark in a DataStream?

Example input for a month-long watermark with no allowed lateness:

[
  { timestamp: '10/2018' },
  { timestamp: '11/2018' },
  { timestamp: '11/2018', clearState: true },
  { timestamp: '9/2018' }
]

Normally, the '9/2018' record would be thrown out as it is late. Is there a way to programmatically reset the watermark state when the clearState message is seen?

austin_ce
  • 1,063
  • 15
  • 28
  • What are you trying to achieve? Window state will get discarded as soon as the watermark as passed so even if you could reset the watermark you would still have lost their state. – gcandal Nov 06 '18 at 10:35

1 Answers1

1

Watermarks are not supposed to go backwards -- it's undefined what will happen, and in practice it's a bad idea. There are, however, various ways to accommodate late data.

If you are using the window API, Flink will clear any window state once the allowed lateness has expired for a window. If you want more control than this, consider using a ProcessFunction, which will allow/require you to manage state (and timers) explicitly.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Thanks for the response! Yes, we don't want to clear watermarks to rewrite individual history - we want to reset the history for a particular key so we can start processing it again fresh. This would be used to backfill data for one key while other keys continue to be processed. – austin_ce Nov 06 '18 at 14:11
  • And if that is not possible, and we have to just use a new key, is it possible to clean out state for keys that have not been used for some amount of time? – austin_ce Nov 06 '18 at 14:32