1

I want to judge if a positive number string is end with ".0", so I wrote the following sql: select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape. So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false. Could anyone tell me the right pattern?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Qing Guo
  • 33
  • 5

2 Answers2

3

Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:

 select '12310' REGEXP '^[0-9]*\\.0$';

Result:

false

Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:

hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true

Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
2

If you know it is a number string, why not just use:

select ( val like '%.0' )

You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.

As for your question . is a wildcard in regular expressions. It matches any character.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • yeah, you are right. At last I user `like`, but I wonder why I cannot write the right regular expression. – Qing Guo Nov 28 '19 at 11:39