1

I got this query from another post I made which uses REGEXP_SUBSTR() to pull out specific information from a string in oracle. It works good but only for small sets of data. When it comes to tables that have 300,000+ records, it is very slow and I was doing some reading that instr + substr might be faster. The example query is:

SELECT REGEXP_SUBSTR(value, '(^|\|)\s*24=\s*(.*?)\s*(\||$)',  1, 1, NULL, 2)  AS "24",
       REGEXP_SUBSTR(value, '(^|\|)\s*35=\s*(.*?)\s*(\||$)',  1, 1, NULL, 2)  AS "35",
       REGEXP_SUBSTR(value, '(^|\|)\s*47A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "47A",
       REGEXP_SUBSTR(value, '(^|\|)\s*98A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "98A"
FROM   table_name

Table example:

CREATE TABLE table_name (value ) AS
SELECT '35= 88234.00 | 47A= Shawn | 98A= This is a comment |' FROM DUAL UNION ALL
SELECT '24= 123.00 | 98A= This is a comment | 47A= Derick |' FROM DUAL

Output of query would be:

24 35 47A 98A
88234.00 Shawn This is a comment
123.00 Derick This is a comment

Can someone give me an example of how this same query would look if I was doing instr+substr instead?

Thank you.

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
CreationSL
  • 55
  • 1
  • 11

2 Answers2

3
SELECT CASE 
       WHEN start_24 > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_24 + 5,
                INSTR(value, '|', start_24 + 5) - (start_24+5)
              )
           )
       END AS "24",
       CASE 
       WHEN start_35 > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_35 + 5,
                INSTR(value, '|', start_35 + 5) - (start_35+5)
              )
           )
       END AS "35",
       CASE 
       WHEN start_47a > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_47a + 6,
                INSTR(value, '|', start_47a + 6) - (start_47a+6)
              )
           )
       END AS "47A",
       CASE 
       WHEN start_98a > 0
       THEN TRIM(
              SUBSTR(
                value,
                start_98a + 6,
                INSTR(value, '|', start_98a + 6) - (start_98a+6)
              )
           )
       END AS "98A"
FROM   (
  SELECT value,
         INSTR(value, '| 24=') AS start_24,
         INSTR(value, '| 35=') AS start_35,
         INSTR(value, '| 47A=') AS start_47a,
         INSTR(value, '| 98A=') AS start_98a
  FROM   (
    SELECT '| ' || value AS value FROM table_name
  )
);

Which, for your sample data, outputs:

24 35 47A 98A
88234.00 Shawn This is a comment
123.00 Derick This is a comment

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • This is perfect thank you. The query I was using was taking over an hour to run and with this new example, I modified my query and it now takes 2 minutes and 28 seconds to run. – CreationSL Nov 04 '21 at 17:04
0

Given the data in your example it seems you could also use a procedural approach for your data extraction, but I'm sceptical if this could be faster.

The following function get24 for example extracts the columns "24" just using INSTR and SUBSTR.

CREATE OR REPLACE FUNCTION get24(value IN VARCHAR2) RETURN VARCHAR2
IS
    i PLS_INTEGER;
    s VARCHAR2(32767);
BEGIN
  i := INSTR(value, '24= ');
  IF (i <> 1) THEN
    RETURN NULL;
  END IF;
  s := SUBSTR(value, i + 4);
  i := INSTR(s, ' | ');
  IF (i = 0) THEN
    RETURN NULL;
  END IF;
  RETURN SUBSTR(s, 1, i - 1);
END;
/

SELECT get24(value) "24" FROM table_name;

You could then also try using a pipelined function and do all the data extraction within the pipelined function.

doberkofler
  • 9,511
  • 18
  • 74
  • 126
  • 1
    If the OP's data includes `124= abc | 24= def |` then this would get `abc` and not `def`. You would need to include a leading delimiter to prevent this. – MT0 Nov 04 '21 at 14:53
  • Correct; this should also need to be considered. I mostly wanted to offer a procedural option but I really like your pure sql solution. – doberkofler Nov 04 '21 at 14:58
  • 1
    You could make it generic [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=272a3c91c91f4d692f0188ae5c946e17). – MT0 Nov 04 '21 at 15:08
  • Just tested it and with a large data set, the functional approach is actually slower. Without having it tested, I would rather go for a pipelined function if using a procedural approach. In real live I would probably decide based on how static the extraction of the columns is. – doberkofler Nov 04 '21 at 15:20