0

I need to filter rows with repeated numbers in some id field in a table. For that i use a regular expression

\b(\d)\1+\b

This is an example of the regex.

https://regex101.com/r/rJ7hJ6/7

But in impala this solution doesn't work. I tried

select regexp_like('1111111111', '([0-9])\1+'); i this case return True

select regexp_like('2222', '([0-9])\1+'); In this case return False

I think is because impala don't recognize \1 operator so i added another backslash to the query to escape the operator

select regexp_like('1111111111', '([0-9])\\1+');

But when execute that i get a syntax error

Invalid regex expression: '([0-9])\1+'

Someone know whats is happened?? and how can fix that?

Thanks!

Figa17
  • 781
  • 7
  • 20
  • pretty confused between you first regex and second one, both are the same why is the first one returning true while the second one false? – karthick Sep 06 '19 at 19:52
  • I think is because does not´recognize \1 operator and assume the number 1 is repeated one or more times in the string. This is why in the first example work and in the second one not – Figa17 Sep 06 '19 at 20:17
  • Try `$1` instead of `\1` – Matt.G Sep 06 '19 at 20:24
  • @Matt.G When i change \ for $ impala return False for both queries – Figa17 Sep 06 '19 at 20:29
  • Try `\b(?P\d)(?P=test)+\b` [Demo](https://regex101.com/r/px8tpZ/2) – Matt.G Sep 06 '19 at 20:39
  • @Matt.G when execute that i get this: Invalid regex expression: '(?Pd)(?P=test)+' – Figa17 Sep 06 '19 at 20:45
  • 1
    I think that on this page for [REGEXP_LIKE](https://www.cloudera.com/documentation/enterprise/latest/topics/impala_string_functions.html#string_functions__regexp_like) there is a link to [the RE2 documentation](https://github.com/google/re2) where on [this page](https://github.com/google/re2/blob/master/doc/syntax.txt) it states `\1 backreference NOT SUPPORTED` – The fourth bird Sep 07 '19 at 10:56

0 Answers0