Pivot on multiple columns dynamically in Spark Dataframe

Question

This is what I am using for two pivot column in a Dataframe where I am concatenating two columns and then doing the transpose.

// Define a udf to concatenate two passed in string values
val concat = udf( (first: String, second: String) => { first + " " + second } )

def main (args: Array[String]) {

    // pivot using concatenated column
    domainDF.withColumn("combColumn", concat($"col1",$"col2"))
      .groupBy("someCol").pivot("combColumn").agg(count).show()

  }

My requirement is make this functionality generic, so any number of columns can be passed as variable argument for concatenation. Can anyone provide any solution for the requirement? Thanks

Shaido · Accepted Answer · 2019-11-12T09:02:20.503

2

Use the built-in concatination function instead, it allows for a variable number of input columns. See the documentation.

In this case, you can do:

import org.apache.spark.sql.functions._

domainDF.withColumn("combColumn", concat(Seq($"col1", $"col2"):_*))
  .groupBy("someCol").pivot("combColumn").agg(count)

If you want to use a separator between the column values, use concat_ws. For example, to use a space: concat_ws(" ", Seq(...)).

If you need to use an UDF due to other concerns, it's possible to use a variable number of arguments by wrapping them in an array, see: Spark UDF with varargs

edited Nov 12 '19 at 09:02

answered Nov 12 '19 at 06:38

Shaido

27,497
23
70
73

Thanks for your reply.. Just one thing I need to check, as what is the default seperator here in concat? In my case I am using single space between two columns. – A B Nov 12 '19 at 08:38
concat_ws is working absolutely fine with first parameter as seperator like.. concat_ws(" ",pivotColumn map col:_*). – A B Nov 12 '19 at 08:43
@AB: As you noted `concat` will not use any separator and `concat_ws` should be used where that is wanted. I added this information to the answer as well. :) – Shaido Nov 12 '19 at 09:03

Pivot on multiple columns dynamically in Spark Dataframe

1 Answers1