1

Using Oracle 12c, how can I use regexp_substr to separate a tab delimited record whose fields may contain spaces? The record has four fields. The third field contains words with spaces.

I'm using this as my reference:Oracle Regex

Here's my query

with tab_delimited_record as
 (select 'Field1    Field2  This is field3 and contains spaces  Field4' as record_with_fields from dual) 
 select record_with_fields,
        regexp_substr('\S+',1,3) as field3a, -- Expect ==>This is field3...
        regexp_substr('\t+',1,3) as field3b, -- Expect==>This is field3...
        regexp_substr('[[::space::]]+',1,3) as field_3c -- Another version
  from  tab_delimited_record

Desired Results

RECORD_WITH_FIELDS

Field1 Field2 This is field3 and contains spaces Field4

FIELD3

This is field3 and contains spaces

zundarz
  • 1,540
  • 3
  • 24
  • 40

3 Answers3

1

I believe you are looking for something like this. Note this example returns all fields for example's sake but of course you can just select field3 if that's all you need. The CTE builds string with tab-delimited fields. The query then uses regex_substr to get the nth (4th argument) string of characters followed by a TAB or the end of the line.

with tab_delimited_record(record_with_fields) as (
  select 'Field1'||chr(09)||'Field2'||chr(09)||'This is field3 and contains spaces'||chr(09)||'Field4' from dual
) 
select record_with_fields,
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 1, null, 1) as field_1, 
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 2, null, 1) as field_2, 
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 3, null, 1) as field_3,
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 4, null, 1) as field_4
from  tab_delimited_record;
Gary_W
  • 9,933
  • 1
  • 22
  • 40
1

You cannot inert '\t' literally when working with Oracle SQL's. You need to break the string, use chr(09) ( ascii tab) and then construct the string. Try this out

with tab_delimited_record as
 (select 'Field1'||chr(09)||'Field2'||chr(09)||'This is field3 and contains spaces'||chr(09)||'Field4' as record_with_fields from dual) 
    select record_with_fields,
        regexp_substr(record_with_fields,'(\S+)\s+(\S+)\s+(.+)\s+',1,1,'',3) as field3a, -- Expect ==>This is field3...
        regexp_substr(record_with_fields,'(\S+)'||chr(09)||'(\S+)'||chr(09)||'(.+)\s+',1,1,'',3) as field3b, -- Expect==>This is field3...
        regexp_substr(record_with_fields,'(\S+)[[:space:]]+(\S+)[[:space:]]+(.+)[[:space:]]+',1,1,'',3) as field_3c -- Another version
  from  tab_delimited_record
stack0114106
  • 8,534
  • 3
  • 13
  • 38
1

Another version of regular expression:

with tab_delimited_record(record_with_fields) as (
  select 'Field1'||chr(09)||'Field2'||chr(09)||'This is field3 and contains spaces'||chr(09)||'Field4' from dual
) 
select record_with_fields,
       regexp_substr(record_with_fields, '[^'||chr(09)||']+', 1, 3) as field_3
from  tab_delimited_record;