I am trying to make sure that a particular column in a dataframe does not contain any illegal values (non- numerical data). For this purpose I am trying to use a regex matching using rlike
to collect illegal values in the data:
I need to collect the values with string characters or spaces or commas or any other characters that are not like numbers. I tried:
spark.sql("select * from tabl where UPC not rlike '[0-9]*'").show()
but this doesn't work. it produces 0 rows.
Any help is appreciated. Thank you.