-3
Concerta 18 mg/24 hr oral tablet, extended release

Expected Output

tablet, extended release

Input

glipiZIDE 10 mg oral tablet, extended release

Expected Output

tablet, extended release

Input

Adderall XR 10 mg oral capsule, extended release

Expected Output

capsule, extended release

I used the below query and getting tables or capsule as output.

select trim(
         regexp_substr(
           lower('Adderall XR 10 mg oral capsule, extended release' ),
           ' ((caps|tab|powd|syr|aero|liq|susp)[a-z]+|solution|lotion|spray|([^a-z0-9])?extended release )+'
         )
       )
from dual;
MT0
  • 143,790
  • 11
  • 59
  • 117
  • 1
    Your request is missing the most important part: The explanation. By which rule do you want to extract the string? So far it seems it is the text following 'oral ', but will this always be the case? Examples are good, but only when added to an explanation not to substitute it. – Thorsten Kettner Feb 21 '22 at 07:39
  • Please don't link to images when not absolutely necessary. And there is still no explanation in your request. Please state the rules precisely that are needed to determine the result string. – Thorsten Kettner Feb 21 '22 at 07:50
  • if the word tablet is there in the string I need "tablet" alone in output and the same for "capsule" also. If string contains "capsule, extended release" or "table, extended release" I need only "tablet, extended release" or ""capsule, extended release" – Sqllearner Feb 21 '22 at 07:53
  • That would be something like `case when text like '%tablet, extended release%' then 'tablet, extended release' when text like '%tablet%' then 'tablet' when text like '%capsule, extended release%' then ...`. And in case it's neither tablet nor capsule (e.g. liquid) then the result shall be null? – Thorsten Kettner Feb 21 '22 at 07:59

2 Answers2

0

With as many info you posted (read: no explanation at all), why would you need regular expressions? Simple substr + instr returns desired output:

SQL> with test (id, col) as
  2    (select 1, 'Concerta 18 mg/24 hr oral tablet, extended release' from dual union all
  3     select 2, 'glipiZIDE 10 mg inhalation aerosol'                 from dual union all
  4     select 3, 'Adderall XR 10 mg subcutaneous solution'            from dual
  5    )
  6  select id,
  7    substr(col, case when instr(col, 'oral')         > 0 then instr(col, 'oral') + 5
  8                     when instr(col, 'inhalation')   > 0 then instr(col, 'inhalation') + 11
  9                     when instr(col, 'subcutaneous') > 0 then instr(col, 'subcutaneous') + 13
 10                end
 11          ) result
 12  from test;

        ID RESULT
---------- --------------------------------------------------
         1 tablet, extended release
         2 aerosol
         3 solution

SQL>
Littlefoot
  • 131,892
  • 15
  • 35
  • 57
  • Pattern may get change in stage data, so am going for regexp_substr eg1-amoxicillin 500 mg oral tablet desired output-table,eg2-acetaminophen-hydrocodone 325 mg-7.5 mg oral tablet output-table,eg3-Toprol-XL 50 mg oral tablet, extended release output-tablet, extended release – Sqllearner Feb 21 '22 at 07:36
  • 1
    All those values would still be OK if you apply query I posted. Lines #1 - 5 represent **sample data**, you don't need that. Query you do need begins at line #6. – Littlefoot Feb 21 '22 at 07:40
  • I have added sample data as image format for your reference – Sqllearner Feb 21 '22 at 07:44
  • Aha; there's something else (besides "oral"). I edited the query; have a look. If another option appears, just add it into the CASE expression. – Littlefoot Feb 21 '22 at 07:51
  • There are millions of records with different patterns. I can't use this dude – Sqllearner Feb 21 '22 at 07:55
  • I doubt it. There aren't *millions* of ways those drugs can be "applied". You either drink them (so it is "oral"), use "inhalation", get them injected, and ... what else? There's a definite number of ways, so you might need to add a couple of WHEN clauses into the CASE expression. What "millions" are you talking about? – Littlefoot Feb 21 '22 at 08:01
  • 3
    @Sqllearner: Your problem is not SQL here. Again and again you are giving us examples of how a string might look like and what you would the result string to be then. But what is the *rule* to determine the result string? A rule that works for any string, not only a particular example. If you are not able to describe the task to a human, how then do you think you will be able to tell a machine? – Thorsten Kettner Feb 21 '22 at 08:05
0

You can use:

SELECT TRIM(
         REGEXP_SUBSTR(
           LOWER(value),
           -- Leading space
           ' '
           -- Dosage form
           || '((caps|tab|powd|syr|aero|liq|susp)[a-z]*|solution|lotion|spray)'
           -- Optional trailing punctuation then "extended release"
           || '([^a-z0-9]*extended release)?'
         )
       ) AS dosage_form
FROM   table_name;

Which, for your sample data:

CREATE TABLE table_name (value) AS
SELECT 'Concerta 18 mg/24 hr oral tablet, extended release' FROM DUAL UNION ALL
SELECT 'glipiZIDE 10 mg oral tablet, extended release'      FROM DUAL UNION ALL
SELECT 'Adderall XR 10 mg oral capsule, extended release'   FROM DUAL;

Outputs:

DOSAGE_FORM
tablet, extended release
tablet, extended release
capsule, extended release

Alternatively, you can list all the dosage forms in a table:

CREATE TABLE dosage_forms (form) AS
SELECT 'capsule'    FROM DUAL UNION ALL
SELECT 'capslet'    FROM DUAL UNION ALL
SELECT 'caps'       FROM DUAL UNION ALL
SELECT 'tablet'     FROM DUAL UNION ALL
SELECT 'powder'     FROM DUAL UNION ALL
SELECT 'syringe'    FROM DUAL UNION ALL
SELECT 'aerosol'    FROM DUAL UNION ALL
SELECT 'liquid'     FROM DUAL UNION ALL
SELECT 'suspension' FROM DUAL UNION ALL
SELECT 'solution'   FROM DUAL UNION ALL
SELECT 'lotion'     FROM DUAL UNION ALL
SELECT 'spray'      FROM DUAL;

CREATE TABLE dosage_form_modifiers (form_modifier) AS
SELECT 'extended release' FROM DUAL;

then use the query (from Oracle 12):

SELECT *
FROM   table_name t
       LEFT OUTER JOIN LATERAL (
         SELECT REGEXP_SUBSTR(
                  t.value,
                  '( |^)'
                  || '('
                  || form
                  || form_modifier
                  || ')',
                  1,
                  1,
                  'i',
                  2
                ) AS match
         FROM   dosage_forms df
                CROSS JOIN (
                  SELECT '[^a-z0-9]*' || form_modifier AS form_modifier
                  FROM   dosage_form_modifiers
                UNION ALL
                  SELECT NULL FROM DUAL
                ) dfm
         ORDER BY
                LENGTH(match) DESC NULLS LAST,
                match NULLS LAST
         FETCH FIRST ROW ONLY
       )
       ON (1 = 1)

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117