Using REGEXP_SUBSTR to get key-value pair data

Question

I have a column with below values,

User_Id=446^User_Input=L307-60#/25" AP^^

I am trying to get each individual value based on a specified key.

All value after User_Id= until it encounters ^
All value after User_Input= until it encounters ^

I tried for and so far I have this,

SELECT  LTRIM(REGEXP_SUBSTR('User_Id=446^User_Input=L307-60#/25" AP^'
          ,'[0-9]+',1,1),'^') User_Id 
from dual

How do I get the value for the User_Input??

P.S: User input can have anything, like ',", *,% including a ^ in the middle of the string (that is, not as a delimiter).

Any help would be greatly appreciated..

If "user input can have anything" are you certain that it cannot have a ^? If it can have a caret, how do you differentiate between a caret that is part of the data and a caret that acts as a delimiter? Is there a reason that you need to use regular expressions rather than `INSTR` and `SUBSTR`? — Justin Cave, Mar 02 '15 at 21:56
You're right, the user input can have ^, there is no stop in the application to prevent them from entering it. I am just trying ways out with regular expressions, if you can suggest a way with INSTR and SUBSTR that would be great too.. — Shankar, Mar 02 '15 at 21:59
This string kind of looks like a modified set of attributes in an element of an xml document. If you are parsing xml, there is value in using Oracle's XML parsing features. — Patrick Bacon, Mar 02 '15 at 22:00
No, this is not from an XML, this is from a VARCHAR2 field in the database. — Shankar, Mar 02 '15 at 22:01
What would you want returned if you had the string "User_Id=446^User_Input=L^307-^60^#/25" AP^"? Do you want to amend the algorithm you posted so that it doesn't stop when it finds the first caret and reads to the end of the string ignoring the final caret? If we implement the algorithm you specified, we'd return just "L". Might there be additional key/value pairs after the `User_Input`? — Justin Cave, Mar 02 '15 at 22:03
I just posted the algorithm to show you guys what I could do so far. And there is no more key/value pairs other than the two listed in the query. — Shankar, Mar 02 '15 at 22:05
I wrote an answer on this very topic a little bit ago - it gets all of the `name=value` pairs in a given string: http://stackoverflow.com/a/28597947 — David Faber, Mar 02 '15 at 23:35
Key-Value pairs are a terrible misunderstanding of how to use a relational database. But if you really must store them, and store them in a single column at that, why not use an industry recognised mechanism such as XML or JSON? — APC, Mar 03 '15 at 06:04
Got this answer from a friend of mine.. Posting here just in case any of you find it interesting. SELECT regexp_replace('User_Id=446^User_Input=L307-60#/25" AP^^', '.*User_Id=([^\^]+).*', '\1') User_Id, regexp_replace('User_Id=446^User_Input=L307-60#/25" AP^^', '.*User_Input=(.*)[\^]$', '\1') User_Input FROM dual — Shankar, Mar 03 '15 at 15:42

score 1 · Answer 1 · answered Mar 02 '15 at 22:02

If there is no particular need to use Regex, something like this returns the value.

WITH rslt AS (
SELECT 'User_Id=446^User_Input=L307-60#/25" AP^' val 
  FROM dual
)
SELECT LTRIM(SUBSTR(val
                   ,INSTR(val, '=', 1, 2) + 1
                   ,INSTR(val, '^', 1, 2) - (INSTR(val, '=', 1, 2) + 1)))
  FROM rslt;

Of course, if you can't guarantee that there will not be any carets that are valid text characters, this will possibly return partial results.

Yes, this also filters out the ^ from the string. Thanks for trying though !! — Shankar, Mar 02 '15 at 22:06

APC · Accepted Answer · 2015-03-03T06:40:58.603

This can be easily solved using boring old INSTR to calculate the offsets of the start and end points for the KEY and VALUE strings.

The trick is to use the optional occurrence parameter to identify each the correct instance of =. Because the input can contain carets which aren't intended as delimiters we need to use a negative position to identify the last ^.

with cte as  (
  select kv
         , instr(kv, '=', 1, 1)+1 as k_st  -- first occurrence 
         , instr(kv, '^', 1) as k_end
         , instr(kv, '=', 1, 2)+1 as v_st  -- second occurrence 
         , instr(kv, '^', -1) as v_end     -- counting from back
  from t23
  )
select substr(kv, k_st, k_end - k_st) as user_id
       , substr(kv, v_st, v_end - v_st) as user_input
from cte
/

Here is the requisite SQL Fiddle to prove it works. I think it's much easier to understand than any regex equivalent.

Patrick Bacon · Answer 3 · 2015-03-03T03:01:59.277

Assuming that you will always have 'User_Id=' and 'User_Input=' in your string, I would use a character group approach to parsing

Use the starting anchor,^, and ending anchor, $. Look for 'User_Id=' and 'User_Input='

Associate the value you are searching for with a character group.

    SCOTT@dev> 
  1  SELECT REGEXP_SUBSTR('User_Id=446^User_Input=L307-60#/25" AP^','^User_Id=(.*\^)User_Input=(.*\^)$',1, 1, NULL, 1) User_Id
  2* FROM dual
SCOTT@dev> /

USER
====
446^


SCOTT@dev> 
  1  SELECT REGEXP_SUBSTR('User_Id=446^User_Input=L307-60#/25" AP^','^User_Id=(.*\^)User_Input=(.*\^)$',1, 1, NULL, 2) User_Input
  2* FROM dual
SCOTT@dev> /

USER_INPUT
================
L307-60#/25" AP^

SCOTT@dev>

score 0 · Answer 4 · answered Mar 03 '15 at 15:41

Got this answer from a friend of mine.. Looks simple and works great...

SELECT
regexp_replace('User_Id=446^User_Input=L307-60#/25" AP^^', '.*User_Id=([^\^]+).*', '\1') User_Id,
regexp_replace('User_Id=446^User_Input=L307-60#/25" AP^^', '.*User_Input=(.*)[\^]$', '\1') User_Input
FROM dual

Posting here just in case any of you find it interesting..

Using REGEXP_SUBSTR to get key-value pair data

4 Answers4