3
def myFunction(df: DataFrame): DataFrame = {
    val myList= List("a","b","c")

    df
      .withColumn("myFlag",
        if (myList.contains(df.select(col("columnName1")))) lit("true") else lit(false))
}

I want to write a function, that takes a Dataframe, and adds a column to it, named "myFlag".

I want "myFlag" to be true if the corresponding "columnName1" has a value that is an element of "myList", false otherwise.

For simplicity, "columnName1" values and "myList" only contain Strings.

My function above will not work. Any suggestions?

Roll_156
  • 37
  • 5

1 Answers1

2

This can be done using isin which is defined on Column:

import spark.implicits._

df
  .withColumn("myFlag",$"columnName1".isin(myList:_*))
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • Not sure what the $ is doing. It doesn't seem to work so please explain or update. – SummerEla Apr 15 '20 at 21:47
  • 1
    the $ symbol is used when you want to refer to a column. so $"col1" will use the values of "col1" and do the subsequent operation, checking if it is in your List, in the example above. Alternatively you can write col("col1") which is the same thing to my knowledge – Roll_156 Apr 16 '20 at 11:38