I have a dataframe and I want to check if on of its columns contains at least one keywords:
from pyspark.sql import types as T
import pyspark.sql.functions as fn
key_labels = ["COMMISSION", "COM", "PRET", "LOAN"]
def containsAny(string, array):
if len(string) == 0:
return False
else:
return (any(word in string for word in array))
contains_udf = fn.udf(containsAny, T.BooleanType())
df = spark.createDataFrame([("COMMISSION", "1"), ("CAMMISSION", "2")], ("original", "id"))
df.withColumn("keyword_match", contains_udf(fn.col("original"),key_labels)).show()
When I run this code, I get the following error:
Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.col.
Trace: py4j.Py4JException:
Method col([class java.util.ArrayList]) does not exist
What am I doing wrong?