0

Cleaning up ,

With Oracle 11g PL/SQL, for below query, can I get the capture groups' positions (something like what Matcher.start() provides in java).

    `select regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2') from dual`

Result should look like : "zone", 9(start of text "zone").

The bigger problem I was trying to solve is to mask data like account number using patterns like '^.....(.*)..$' (this pattern can vary depending on installation).

Paddy
  • 609
  • 7
  • 25
  • 2
    Still a bit confused on what you want exactly. Can you give a simple input/output example of desired results? – tbone Dec 28 '12 at 14:05
  • I'm with @tbone here; I have no idea what you're trying to mask and what you want it to look like after. – Ben Dec 28 '12 at 14:07
  • @AjithSasidharan, you have to [wait 2 days](http://meta.stackexchange.com/questions/16065/how-does-the-bounty-system-work) to start a bounty. – Ben Dec 28 '12 at 14:08

2 Answers2

0

Will something like below work for you?

select regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2') expr ,instr('1234bankzone1234',regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2')) pos from dual

or more readable subquery like

select a.*, instr(a.value,a.expr) from ( select '1234bankzone1234' value, regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2') expr from dual ) a

I couldn't find any direct equivalent of Matcher API like functionality and there is no way you can access the position group buffer in SQL.

Baski
  • 829
  • 8
  • 14
  • This may cause issues in some cases. – Paddy Dec 29 '12 at 01:12
  • This may cause issues in some cases. If there are more than one capture groups and one match is subset of another (suppose 2 capture groups have "zone" and "azone" match), I'll miss one of those. With different kind of data(device IDs, account/PAN/customer # etc) and multiple installations, the risk goes higher. – Paddy Dec 29 '12 at 01:25
  • It is true. Do you know what would be the size of the text you are trying to and mask and # of groups/patterns you have within them? The reason I ask is grouping in a way is more of string splitting and putting them together. As long as you have < 5-7 distinct groups we can do in SQL. – Baski Dec 29 '12 at 07:21
  • Unfortunately exact size of text-to-be-masked and the masknig pattern may vary. Masking is required for multiple installations for various kind of data (account number, IP address, dates...). For account number itself, I have seen at least 3 different patterns. – Paddy Dec 31 '12 at 06:50
  • This is what I am doing for now : 1) Inverse that masking pattern. 2) Get the indexes and count of capture groups within both original and reversed patterns. 3) Get matching text from the source string for both original and reversed pattern capture groups 4) Merge captured text from step 3 according to indexes got in step 2. Mask the matches with original string. I am providing more details below. – Paddy Dec 31 '12 at 10:29
0

1: Reverse pattern using this

regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( pattern, '(\()', '\1#') , '(\))', '#\1') , '\(#', ')#') , '\^\)#', '^') , '#\)\$', '$') , '#\)', '(#') , '#', '') , '\^([^\(]+\))', '^(\1') , '\(([^\)]+)\$', '(\1)$');

So, "^(.)..(.).$"; becomes "^.(..).(.)$";

2: Use this to bulk collect index and count of capture groups within both patterns

SELECT REGEXP_instr(pattern, '\(.*?\)+', 1, LEVEL) bulk collect into posCapture FROM v CONNECT BY LEVEL <= REGEXP_COUNT(pattern, '\(.*?\)');

3: Match both patterns against the text-to-be-masked. Merge them by the order found in step 2.

select regexp_replace(v_src, pattern, '\' || captureIndex) into tempStr from dual;

Paddy
  • 609
  • 7
  • 25