0

I do not want to use foldLeft or withColumn with when over all columns in a dataframe, but want a select as per https://medium.com/@manuzhang/the-hidden-cost-of-spark-withcolumn-8ffea517c015, embellished with an if else statement and cols with vararg. All I want is to replace an empty array column in a Spark dataframe using Scala. I am using size but it never computes the zero (0) correctly.

val resDF2 = aggDF.select(cols.map { col =>
         ( if (size(aggDF(col)) == 0) lit(null) else aggDF(col) ).as(s"$col") 
                                   }: _*)

if (size(aggDF(col)) == 0) lit(null) does not work here functionally, but it does run and size(aggDF(col)) returns the correct length if I return that.

I am wondering what the silly issue is. Must be something I am obviously overlooking!

thebluephantom
  • 16,458
  • 8
  • 40
  • 83

1 Answers1

1

if-else won't work with DataFrame API, these are for Scala logical expressions. With DataFrames you need when/otherwise:

val resDF2 = aggDF.select(cols.map { col => ( when(size(aggDF(col)) === 0,lit(null)).otherwise(aggDF(col))).as(s"$col") }: _*)

This can further be simplified because when without otherwise automatically returns null (i.e. otherwise(lit(null)) is the default):

val resDF2 = aggDF.select(cols.map { col => when(size(aggDF(col)) > 0,aggDF(col)).as(s"$col") }: _*)

See also https://stackoverflow.com/a/48074218/1138523

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145