0

I am looking at large strings in text. In the text alot of times have . or especially ? ? that serves as delimiters.

So what I want from the string 'Delivery Note ?Patient was at home. Delivered at home'

is Delivered at home BUT NOT Delivery Note ?Patient was at home which is what I am getting currently. Code below works well otherwise.

ADDED: Given the comment below, use of regular expressions is used here as there many permutations in the actual data source such as delivered at home, delivery in an ambulance, delivery in vehicle or born in an ambulance, born at home, etc, etc.

Any suggestions appreciated-

with txt as (select 1 as ID,
            'Delivery Note ?Patient was at home. Delivered at home' as note_text from dual)
    select txt.*,
regexp_substr(NOTE_TEXT, '(born|birth|sv(b|d)|deliver(ed|y|ing)|vbac)[^.|?|,|(|)|;|:|-].{1,24}(vehicle|bathroom|ambulance|home|[^a-z]car([^a-z|]|$))',1,1,'i') 
    as results
    
    from txt;
  • 1
    Perhaps you should explain rules which lead to desired result. As you posted just a single example, my "answer" might be that you'd rather select everything after the last dot (`substr(note_text, instr(note_text, '.', -1) + 1)`) and that's way simpler than such a complex regular expression. – Littlefoot Apr 19 '21 at 18:33
  • Thanks for the tip. And yes it is simpler for sure. Unfortunately I need to use a regular expression due to the number of permutations the notes have. Also in the code you provided-there may be a period. Or not. There may a question mark, or not..Overall it works well except for this last part. – Lawrence Block Apr 19 '21 at 19:25

1 Answers1

0

Looks like this is what you need:

with txt as (select 1 as ID,
            'Delivery Note ?Patient was at home. Delivered at home' as note_text from dual)
    select txt.*,
regexp_substr(NOTE_TEXT, '(born|birth|sv(b|d)|deliver(ed|y|ing)|vbac)[^.?,)(;:-]{1,24}(vehicle|bathroom|ambulance|home|[^a-z]car([^a-z|]|$))',1,1,'i') 
    as results    
    from txt;

As you can see I removed | in [^...], because you just need to specify excluded character without or, and removed . after it.

Result:

        ID NOTE_TEXT                                             RESULTS
---------- ----------------------------------------------------- --------------------
         1 Delivery Note ?Patient was at home. Delivered at home Delivered at home
Sayan Malakshinov
  • 8,492
  • 1
  • 19
  • 27