15

With Spark SQL's window functions, I need to partition by multiple columns to run my data queries, as follows:

val w = Window.partitionBy($"a").partitionBy($"b").rangeBetween(-100, 0)

I currently do not have a test environment (working on settings this up), but as a quick question, is this currently supported as a part of Spark SQL's window functions, or will this not work?

Eric Staner
  • 969
  • 2
  • 9
  • 14

2 Answers2

31

This won't work. The second partitionBy will overwrite the first one. Both partition columns have to be specified in the same call:

val w = Window.partitionBy($"a", $"b").rangeBetween(-100, 0)
zero323
  • 322,348
  • 103
  • 959
  • 935
0

if you are using the columns at multiple places where you are doing partitionBy then you could assign that to a variable in form of list and then use that list directly as a argument value for the partitionBy in the code.

val partitioncolumns = List("a","b")
val w = Window.partitionBy(partitioncolumns:_*).rangeBetween(-100, 0)

By using :_* at the end of the list variable it convert that to varargs and that is the argument type that partitionBy takes. So your code would work the way you want.

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35