Partitioning by multiple columns in Spark SQL

Question

With Spark SQL's window functions, I need to partition by multiple columns to run my data queries, as follows:

val w = Window.partitionBy($"a").partitionBy($"b").rangeBetween(-100, 0)

I currently do not have a test environment (working on settings this up), but as a quick question, is this currently supported as a part of Spark SQL's window functions, or will this not work?

score 31 · Accepted Answer · answered Jun 13 '16 at 17:48

31

This won't work. The second partitionBy will overwrite the first one. Both partition columns have to be specified in the same call:

val w = Window.partitionBy($"a", $"b").rangeBetween(-100, 0)

answered Jun 13 '16 at 17:48

zero323

322,348
103
959
935

score 0 · Answer 2 · answered Apr 23 '21 at 07:42

if you are using the columns at multiple places where you are doing partitionBy then you could assign that to a variable in form of list and then use that list directly as a argument value for the partitionBy in the code.

val partitioncolumns = List("a","b")
val w = Window.partitionBy(partitioncolumns:_*).rangeBetween(-100, 0)

By using :_* at the end of the list variable it convert that to varargs and that is the argument type that partitionBy takes. So your code would work the way you want.

Partitioning by multiple columns in Spark SQL

2 Answers2

Linked