0

I trying to use rlike() to the money [whether it has dollar sign( $) , comma ( ,) , decimal sign(.) and numbers before and after the decimal sign also there can be a negative sign before / after the $ sign) Below is the regex expression i came up with - ^$?-?[0-9],?[0-9].?[0-9]*$ its can able tp find the match, if i try to test in https://regex101.com/

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
df= df.unionAll(cdf.withColumn("ErrorMessage", lit("Invalid Amount Recovered"))\
                                                           .filter(~ col("AmountRecovered").rlike('^\$?\-?[0-9]*\,?[0-9]*\.?[0-9]*$'))).distinct()
display(df)

Also i tried replacing ~ with == False like this-

df= df.unionAll(cdf.withColumn("ErrorMessage", lit("Invalid Amount Recovered"))\
                                                           .filter( col("AmountRecovered").rlike('^\$?\-?[0-9]*\,?[0-9]*\.?[0-9]*$')==False)).distinct()

It is not working either.

enter image description here

Vaishnavi S
  • 25
  • 1
  • 5
  • How is your regular expression not working? Are there inputs it's failing to match? If so, which inputs? Also, you can use `\d` anywhere you use `[0-9]`. `\d` is a shortcut for a number character. You also don't need a backslash before `-` or `,`, since those aren't special characters. – Benji Jun 28 '22 at 18:42
  • Hi @Benji, Yes the regular expression is working in the testing sites, but its not working databricks, its returning false for the valid inputs – Vaishnavi S Jun 28 '22 at 18:52

1 Answers1

0

I noticed two things wrong with your regular expression: it doesn't match a - before a $ (for an input like -$5.00) and it doesn't let you have multiple commas (for an input like $500,000,000,000).

I also simplified the expression a bit by removing unnecessary \'s and replacing [0-9] with \d.

Here's a tweaked pattern that should match your criteria better:

^-?\$?-?(\d*,)*\d*\.?\d*$

You can see it in action here: https://regexr.com/6om9n

Benji
  • 66
  • 4
  • Thank you Benji, i see the inputs are getting passed in https://regexr.com/6om9n, but in databricks its returning false , i dont know why. i added the screenshot in my questions – Vaishnavi S Jun 28 '22 at 18:59
  • Can someone please help, i m stuck with pyspark saying false for my expression and Benji expression as well. attached the screenshot in my question. – Vaishnavi S Jun 28 '22 at 19:50