Replacing multiple occurrence by single occurrence in string

Question

I have data like "11223311" and I want all the multiple occurrence to be replaced by single occurrence i.e. the above should turn into '123'. I am working in SAP HANA.

But by using below logic I am getting '1231' from '11223311'.

SELECT  REPLACE_REGEXPR('(.)\1+' IN '11223331' WITH '\1' OCCURRENCE ALL)  FROM DUMMY;

djk · Accepted Answer · 2017-02-13T11:25:41.913

3

Your regular expression only replaces multiple consecutive occurrences of characters; that's what the \1+ directly after it's matching (.) is doing.

You can use look-ahead to remove all characters that also occur somewhere after that match. Note that this keeps the last occurrence, not the first:

SELECT  REPLACE_REGEXPR('(.)(?=.*\1)' IN '11223331' WITH '' OCCURRENCE ALL)  FROM DUMMY

This returns: 231

If you want to keep the first occurrence, I don't see a possibility just with one regex (I could be wrong though). Using a look-behind in the same way does not work because it would need to be variable-length, which is not supported in HANA and most other implementations. Often \K is recommended as alternative, but something like (.).*\K\1 wouldn't work with replace all, because all characters before \K are still consumed in replace. If you could run the same regex in a loop, it could work but then why not use a non-regex loop (like a user-defined HANA function) in the first place.

edited Feb 13 '17 at 11:25

answered Feb 13 '17 at 10:22

djk

943
2
9
27

I loved this solution – David דודו Markovitz Feb 13 '17 at 10:34
@hannobo It really worked well, thanks a lott :) Can you tell me how"(.)(?=.*\1)" works?? – Sarthak Srivastava Feb 14 '17 at 08:40
@SarthakSrivastava: Written in text it would be "Match one character `(.)` that is followed by any number of any characters `.*`, followed by the first character again `\1`." The `(?= )` is a look-ahead ensuring that the condition is met but the matching characters are not consumed, so replace can go through character by character. – djk Feb 14 '17 at 09:36

David דודו Markovitz · Answer 2 · 2017-02-14T09:46:42.290

0

Please try this

SELECT  REPLACE_REGEXPR(concat(concat('[^','11223331'),']') IN '0123456789' WITH '' OCCURRENCE ALL)  
FROM    DUMMY;

edited Feb 14 '17 at 09:46

answered Feb 13 '17 at 10:02

David דודו Markovitz

42,900
6
64
88

concat() expects exactly two parameters; this one gives "SAP DBTech JDBC: [316]: wrong number of arguments in function invocation". Why not just write `'[^11223331]'`? – djk Feb 14 '17 at 09:40
@Hannobo - The real use-case is obviously not the hard-coded value 11223331. Code fixed. Thanks – David דודו Markovitz Feb 14 '17 at 09:44

Replacing multiple occurrence by single occurrence in string

2 Answers2