1

I'm trying to come up with a regex pattern that will work with regexp_like in Impala and which will match values that are decimals (up to ten numbers followed by a decimal followed by one or more numbers).

I have a pattern which is working in .NET ("^-?\d{1,10}\.\d+$"), but this isn't working in regexp_like.

I've tried something similar in Impala ("^-?[0-9]{1,10}\.[0-9]+$"), but it keeps returning true for integers. Why isn't it requiring the decimal to be there?

Some expected scenario results:

0 = False
0. = False
.5 = False
0.1 = True
123456 = False
-123456 = False
123456.2 = True
-123456.2 = True
Test = False

I'd like to make it even more complex and disallow numbers that start with multiple zeros, but I can't even get it to require the decimal point.

WATYF
  • 409
  • 1
  • 5
  • 16

1 Answers1

2

Your regex still matches integers because the \ before d in a string literal disappears and the . remains and matches any char in the string.

See the Impala REGEXP_LIKE documentation:

Because the impala-shell interpreter uses the \ character for escaping, use \\ to represent the regular expression escape character in any regular expressions that you submit through impala-shell . You might prefer to use the equivalent character class names, such as [[:digit:]] instead of \d which you would have to escape as \\d.

Thus, you can use

"^-?[0-9]{1,10}\\.[0-9]+$"
"^-?\\d{1,10}\\.\\d+$"
"^-?[[:digit:]]{1,10}\\.[[:digit:]]+$"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thank you, sir. That was driving me nuts. I was wondering why "\d" wasn't working either. – WATYF Oct 29 '20 at 18:16