1

This is what I am using for two pivot column in a Dataframe where I am concatenating two columns and then doing the transpose.

// Define a udf to concatenate two passed in string values
val concat = udf( (first: String, second: String) => { first + " " + second } )

def main (args: Array[String]) {

    // pivot using concatenated column
    domainDF.withColumn("combColumn", concat($"col1",$"col2"))
      .groupBy("someCol").pivot("combColumn").agg(count).show()

  }

My requirement is make this functionality generic, so any number of columns can be passed as variable argument for concatenation. Can anyone provide any solution for the requirement? Thanks

Shaido
  • 27,497
  • 23
  • 70
  • 73
A B
  • 1,926
  • 1
  • 20
  • 42

1 Answers1

2

Use the built-in concatination function instead, it allows for a variable number of input columns. See the documentation.

In this case, you can do:

import org.apache.spark.sql.functions._

domainDF.withColumn("combColumn", concat(Seq($"col1", $"col2"):_*))
  .groupBy("someCol").pivot("combColumn").agg(count)

If you want to use a separator between the column values, use concat_ws. For example, to use a space: concat_ws(" ", Seq(...)).


If you need to use an UDF due to other concerns, it's possible to use a variable number of arguments by wrapping them in an array, see: Spark UDF with varargs

Shaido
  • 27,497
  • 23
  • 70
  • 73
  • Thanks for your reply.. Just one thing I need to check, as what is the default seperator here in concat? In my case I am using single space between two columns. – A B Nov 12 '19 at 08:38
  • concat_ws is working absolutely fine with first parameter as seperator like.. concat_ws(" ",pivotColumn map col:_*). – A B Nov 12 '19 at 08:43
  • @AB: As you noted `concat` will not use any separator and `concat_ws` should be used where that is wanted. I added this information to the answer as well. :) – Shaido Nov 12 '19 at 09:03