0

I want to create a column called "quelle" filled with a lit("XY") if another column contains a certain substring. The code I wrote like this:

df => df.withColumn("quelle", when(substring_of_other_column.contains("123"), lit("XY")))

But i get the error: Type mismatch: Required column, found boolean.

Any help will be appreciated, thank you.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132

1 Answers1

2

df.withColumn("quelle",when(col("id").contains("123"),lit("XY")).otherwise("AA")) should give you the result you are looking for

I tried to reproduce the error you are getting and this is likely the one which you tried df.withColumn("quelle",when("id".contains("123"),lit("XY")).otherwise("AA")) which gives me exactly the same error you are getting above.

error: type mismatch; found : Boolean required: org.apache.spark.sql.Column

Please check if this helps

linusRian
  • 340
  • 2
  • 12
  • bro can we get married, thank you. it works now. could you explain to me what the problem was? im a scala noob so it'd be nice to know (: – MaryTheEngineer May 28 '21 at 18:27
  • 1
    'when' accepts a column as argument, so you must include col("col_name") or $"col_name" instead of just col_name -- def when(condition: org.apache.spark.sql.Column,value: Any) – linusRian May 28 '21 at 18:41
  • @marryTheEngineer, if you are new to scala, a couple of things in this situation might help, scala is strong type language, type must match. Try to use an IDE such as IntelliJ to “```ctrl+b```” on you code to go and find the underlying signature definition, it shows you the function parameter type requirements clearly. – soMuchToLearnAndShare May 29 '21 at 06:52