1

I'm trying to parse a dollar amount from a string.

Example strings:

  • *SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54
  • *SOC 1369.00 - NCS 1239.46 = PT LIAB 140
  • *SOC = 1178.00
  • *SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE
  • *LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77
  • SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44
  • *SOC 1622.00 - NCS 209.74 = PT LIAB 1412 RECIPIENT AGE 1234

I want to pull the patient liability, which is the dollar amount following text "LIAB," "PT LIAB," or "LIABLE." The dollar amount can be negative, and may or may not can have a decimal.

My solution:

    REPLACE(REPLACE(REGEXP_SUBSTR(REMARKS,'LIAB+[LE]?+ (-?+\d+[.]?)+\d'),'LIAB ',''),'LIABLE ','')

This seems to be a bit clunky, and I assume there's a more simple solution. Any guidance on would be appreciated!

I'm using Toad for Oracle 12.8.

boognish
  • 50
  • 8

2 Answers2

1

Give this a try:

SQL> with tbl(rownbr, remarks) as (
         select 1, '*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54'                from dual union
         select 2, '*SOC 1369.00 - NCS 1239.46 = PT LIAB 140'                   from dual union
         select 3, '*SOC = 1178.00'                                             from dual union
         select 4, '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE'  from dual union
         select 5, '*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77'                from dual union
         select 6, 'SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44' from dual union
         select 7, '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412 RECIPIENT AGE 1234' from dual
       )
       select rownbr,
             case
               when remarks = regexp_replace(remarks, '.*((LIAB|LIABLE) ([-.0-9]+)).*$', '\3') then
                 '0' -- regexp_replace returns the orig string if the pattern is not found.
               else
                 regexp_replace(remarks, '.*((LIAB|LIABLE) ([-.0-9]+)).*$', '\3')
             end patient_liability
      from tbl;

    ROWNBR PATIENT_LIABILITY
---------- -------------------------
         1 129.54
         2 140
         3 0
         4 1412.26
         5 -22.77
         6 1563.44
         7 1412

7 rows selected.

SQL>
Gary_W
  • 9,933
  • 1
  • 22
  • 40
0

You can almost get there with REGEXP_REPLACE() and a back reference:

REGEXP_REPLACE(REMARKS,'.*(PT LIAB|LIAB|LIABLE) (-?\d+[.]?\d+).*', '\2')

... but that passes a value without a matching pattern through untouched (so your third example would still get *SOC = 1178.00. You can use a case expression and REGEXP_LIKE() to avoid that:

with t (remarks) as (
  select '*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54' from dual
  union all select '*SOC 1369.00 - NCS 1239.46 = PT LIAB 140' from dual
  union all select '*SOC = 1178.00' from dual
  union all select '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE' from dual
  union all select '*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77' from dual
  union all select 'SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44' from dual
)
SELECT REMARKS,
  CASE WHEN REGEXP_LIKE(REMARKS, '.*(PT LIAB|LIAB|LIABLE) (-?\d+[.]?\d+).*')
    THEN REGEXP_REPLACE(REMARKS,'.*(PT LIAB|LIAB|LIABLE) (-?\d+([.]\d+)?).*', '\2')
    END as liability
from t;

REMARKS                                                    LIABILITY
---------------------------------------------------------- ----------
*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54                129.54    
*SOC 1369.00 - NCS 1239.46 = PT LIAB 140                   140       
*SOC = 1178.00                                                       
*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE  1412.26   
*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77                -22.77    
SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44 1563.44   

but that doesn't seem much better, and having the regex twice makes it even more expensive. (It could probably also still be simplified...). You could also use REGEXP_REPLACE() instead of the two plain REPLACE() calls:

REGEXP_REPLACE(
  REGEXP_SUBSTR(REMARKS, '(PT LIAB|LIAB|LIABLE) (-?\d+([.]\d+)?)'),
  '(PT LIAB|LIAB|LIABLE) ')

but again that's making it more expensive.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318