0

Considering I have a simple aggregation with a window defined without any watermark say.

df
.groupBy(window(col("time"), "30 minutes","10 minutes").as("time"))
.aggr ....

Here as our window is 30 minutes, and a sliding interval of 10 minutes

  • Q1. Does that mean that after 10 minutes, it will slide?
  • Q2. If so, then isn't it somewhat similar to watermark?
blackbishop
  • 30,945
  • 11
  • 55
  • 76
supernatural
  • 1,107
  • 11
  • 34
  • okay and one more finding, that if we dont use watermark in the above code example, with new incoming records the dataframe will keep growing. Thanks @thebluephantom – supernatural Jan 27 '21 at 10:56

1 Answers1

1
  1. Yes it will slide / compute every 10 minutes (sliding interval) providing an overlapping window of 30 mins. You do not define if using event or ingest time. If one uses event time, then the late handling, out of order is handled to update include such data in updated windows as time goes by.

  2. Following on from previous question, this is not the same as watermarking. Watermarking means that after a period, late-arriving data is dropped and thus the effect described above has a time consideration to contend with. That is to say, some older windows will not be updated.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
  • Hi @thebluephantom , if we dont keep any watermark, does it mean the state will continue to grow with new incoming data and including the previous one? – supernatural Jan 27 '21 at 02:13
  • that is my understanding, if using complete and eventually an oom will result. hence the watermarking that drops data. not always well explained imho in the docs. – thebluephantom Jan 27 '21 at 08:48
  • Thanks @thebluephantom, there is one more doubt I have regarding the state, could you please take a look into it : More specifically : https://stackoverflow.com/questions/65917336/does-the-state-also-gets-removed-on-event-timeout-with-mapgroupswithstate-flatma – supernatural Jan 27 '21 at 10:54
  • will look at later – thebluephantom Jan 27 '21 at 11:09