2

How can I create column colMap of ArrayType[StringType] which value is Array with elements being strings matching names of the column which values were true?

I have such input DataFrame:

+-----+-----+-----+
|col1 |col2 |col3 |
+-----+-----+-----+
|true |false|true |
|false|false|false|
|false|false|true |
+-----+-----+-----+

and I want to create such output DataFrame:

+-----+-----+-----+------------+
|col1 |col2 |col3 |colMap      |
+-----+-----+-----+------------+
|true |false|true |[col1, col3]|
|false|false|false|[]          |
|false|false|true |[col3]      |
+-----+-----+-----+------------+

EDIT: I have found this duplicated question:

Spark scala get an array of type string from multiple columns

but wonder if there is better way to achieve the output?

Mohana B C
  • 5,021
  • 1
  • 9
  • 28
Dariusz Krynicki
  • 2,544
  • 1
  • 22
  • 47

1 Answers1

2

Instead of using UDF to filter null values from an array, you can use built-in higher order function filter.

val df = Seq((true, false, true),
    (false, false, false),
    (false, false, true)).toDF("col1", "col2", "col3")


df.withColumn("colMap", array(df.columns.map(c=> when(col(c) === "true", c)):_*))
  .withColumn("colMap", expr("filter(colMap, c-> c is not null)"))
  .show(false)

+-----+-----+-----+------------+
|col1 |col2 |col3 |colMap      |
+-----+-----+-----+------------+
|true |false|true |[col1, col3]|
|false|false|false|[]          |
|false|false|true |[col3]      |
+-----+-----+-----+------------+
Mohana B C
  • 5,021
  • 1
  • 9
  • 28