0

I have a dataset which looks like this:

+---+-------------------------------+--------+
|key|value                          |someData|
+---+-------------------------------+--------+
|1  |AAA                            |5       |
|1  |VVV                            |6       |
|1  |DDDD                           |8       |
|3  |rrerw                          |9       |
|4  |RRRRR                          |13      |
|6  |AAAAABB                        |15      |
|6  |C:\Windows\System32\svchost.exe|20      |
+---+-------------------------------+--------+

Now, I apply aggregative avg function twice, first over ordered Window, later on unordered window, the results are not the same example:

WindowSpec windowSpec = Window.orderBy(col("someData")).partitionBy(col("key"));
rawMapping.withColumn("avg", avg("someData").over(windowSpec)).show(false);

+---+-------------------------------+--------+-----------------+
|key|value                          |someData|avg              |
+---+-------------------------------+--------+-----------------+
|1  |AAA                            |5       |5.0              |
|1  |VVV                            |6       |5.5              |
|1  |DDDD                           |8       |6.333333333333333|
|6  |AAAAABB                        |15      |15.0             |
|6  |C:\Windows\System32\svchost.exe|20      |17.5             |
|3  |rrerw                          |9       |9.0              |
|4  |RRRRR                          |13      |13.0             |
+---+-------------------------------+--------+-----------------+

WindowSpec windowSpec2 = Window.partitionBy(col("key"));
rawMapping.withColumn("avg", avg("someData").over(windowSpec2)).show(false);

+---+-------------------------------+--------+-----------------+
|key|value                          |someData|avg              |
+---+-------------------------------+--------+-----------------+
|1  |AAA                            |5       |6.333333333333333|
|1  |VVV                            |6       |6.333333333333333|
|1  |DDDD                           |8       |6.333333333333333|
|6  |AAAAABB                        |15      |17.5             |
|6  |C:\Windows\System32\svchost.exe|20      |17.5             |
|3  |rrerw                          |9       |9.0              |
|4  |RRRRR                          |13      |13.0             |
+---+-------------------------------+--------+-----------------+

When the window is oredered, the aggregative function has a "sliding window" behavior, why is this happening? and more importantly, is it a bug or a feature?

antonpuz
  • 3,256
  • 4
  • 25
  • 48
  • 1
    Possible duplicate of [What's the default window frame for window functions](https://stackoverflow.com/questions/47130030/whats-the-default-window-frame-for-window-functions) – 10465355 Feb 28 '19 at 15:06
  • It's the right results as you post. "sliding window" only works on a ordered partition, how to slide a window over unordered partition? – shiqin zhang Mar 01 '19 at 03:19

0 Answers0