Oracle: instr+substr instead of regexp_substr

Question

I got this query from another post I made which uses REGEXP_SUBSTR() to pull out specific information from a string in oracle. It works good but only for small sets of data. When it comes to tables that have 300,000+ records, it is very slow and I was doing some reading that instr + substr might be faster. The example query is:

SELECT REGEXP_SUBSTR(value, '(^|\|)\s*24=\s*(.*?)\s*(\||$)',  1, 1, NULL, 2)  AS "24",
       REGEXP_SUBSTR(value, '(^|\|)\s*35=\s*(.*?)\s*(\||$)',  1, 1, NULL, 2)  AS "35",
       REGEXP_SUBSTR(value, '(^|\|)\s*47A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "47A",
       REGEXP_SUBSTR(value, '(^|\|)\s*98A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "98A"
FROM   table_name

Table example:

CREATE TABLE table_name (value ) AS
SELECT '35= 88234.00 | 47A= Shawn | 98A= This is a comment |' FROM DUAL UNION ALL
SELECT '24= 123.00 | 98A= This is a comment | 47A= Derick |' FROM DUAL

Output of query would be:

24	35	47A	98A
	88234.00	Shawn	This is a comment
123.00		Derick	This is a comment

Can someone give me an example of how this same query would look if I was doing instr+substr instead?

Thank you.

Please show sample data and desired result, and explain your logic. — Andrew, Nov 04 '21 at 14:09
Your regexp query allows an unknown number of spaces before the hardcoded `24=` and similar. Is that important? If it is, it will be a bit hard to solve the same problem with standard INSTR and SUBSTR. — , Nov 04 '21 at 15:25

score 3 · Accepted Answer · answered Nov 04 '21 at 14:40

SELECT CASE 
       WHEN start_24 > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_24 + 5,
                INSTR(value, '|', start_24 + 5) - (start_24+5)
              )
           )
       END AS "24",
       CASE 
       WHEN start_35 > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_35 + 5,
                INSTR(value, '|', start_35 + 5) - (start_35+5)
              )
           )
       END AS "35",
       CASE 
       WHEN start_47a > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_47a + 6,
                INSTR(value, '|', start_47a + 6) - (start_47a+6)
              )
           )
       END AS "47A",
       CASE 
       WHEN start_98a > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_98a + 6,
                INSTR(value, '|', start_98a + 6) - (start_98a+6)
              )
           )
       END AS "98A"
FROM   (
  SELECT value,
         INSTR(value, '| 24=') AS start_24,
         INSTR(value, '| 35=') AS start_35,
         INSTR(value, '| 47A=') AS start_47a,
         INSTR(value, '| 98A=') AS start_98a
  FROM   (
    SELECT '| ' || value AS value FROM table_name
  )
);

Which, for your sample data, outputs:

24 35 47A 98A

88234.00 Shawn This is a comment

123.00 Derick This is a comment

db<>fiddle here

This is perfect thank you. The query I was using was taking over an hour to run and with this new example, I modified my query and it now takes 2 minutes and 28 seconds to run. — CreationSL, Nov 04 '21 at 17:04

doberkofler · Answer 2 · 2021-11-04T15:01:17.713

0

Given the data in your example it seems you could also use a procedural approach for your data extraction, but I'm sceptical if this could be faster.

The following function get24 for example extracts the columns "24" just using INSTR and SUBSTR.

CREATE OR REPLACE FUNCTION get24(value IN VARCHAR2) RETURN VARCHAR2
IS
    i PLS_INTEGER;
    s VARCHAR2(32767);
BEGIN
  i := INSTR(value, '24= ');
  IF (i <> 1) THEN
    RETURN NULL;
  END IF;
  s := SUBSTR(value, i + 4);
  i := INSTR(s, ' | ');
  IF (i = 0) THEN
    RETURN NULL;
  END IF;
  RETURN SUBSTR(s, 1, i - 1);
END;
/

SELECT get24(value) "24" FROM table_name;

You could then also try using a pipelined function and do all the data extraction within the pipelined function.

edited Nov 04 '21 at 15:01

answered Nov 04 '21 at 14:51

doberkofler

9,511
18
74
126

1

If the OP's data includes `124= abc | 24= def |` then this would get `abc` and not `def`. You would need to include a leading delimiter to prevent this. – MT0 Nov 04 '21 at 14:53
Correct; this should also need to be considered. I mostly wanted to offer a procedural option but I really like your pure sql solution. – doberkofler Nov 04 '21 at 14:58
1

You could make it generic [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=272a3c91c91f4d692f0188ae5c946e17). – MT0 Nov 04 '21 at 15:08
Just tested it and with a large data set, the functional approach is actually slower. Without having it tested, I would rather go for a pipelined function if using a procedural approach. In real live I would probably decide based on how static the extraction of the columns is. – doberkofler Nov 04 '21 at 15:20

Oracle: instr+substr instead of regexp_substr

2 Answers2