I can specify checks in the transform decorator, such as Primary Key. Can I also specify a custom check which applies a lambda function, for example? Thanks!
I read the documentation and couldn't find an existing check type that confirms to my use case.
EDIT: Here's a code example of what I am trying to accomplish. For example, I want to check if an array column only contains distinct elements. The transform check should raise a warning if my UDF returns false. This is how I would implement the check with an extra column (rather than using checks):
df = (
df
.withColumn('my_array_col1', F.array(F.lit('first'), F.lit('second'), F.lit('third')))
.withColumn('my_array_col2', F.array(F.lit('first'), F.lit('first')))
.withColumn('custom_check1', check_for_distinct_array_elements(F.col('my_array_col1')))
.withColumn('custom_check2', check_for_distinct_array_elements(F.col('my_array_col2')))
)
@F.udf
def check_for_distinct_array_elements(arr):
return len(set(arr)) == len(arr)