0

I know how to read the data from file using utl_file but I am facing an issue with the data below. We can't assume in which position data will come as null. How can we handle this?

Sample data: apple|bat|cat|"dog | dog"||||eee||abc

Expected output:

col1:apple
col2:bat
col3:cat
col4:dog dog
col5:
col6:
col7:
col8:eee
col9:
col10:abc

i have tried below code but it's not handling null values

declare
list varchar2(3500) :='apple|bat|cat|"dog | dog"||||eee||abc';
pattern varchar2(20) := '(" [^"]*"|[^|]+)';
i number:=0;
j number;
f varchar2(3500);
 c sys_refcursor;
begin
dbms_output.put_line('Raw list: ' || list);
 open c for
             select  level as col,
                 trim(regexp_substr(replace(list,'|','|'), pattern, 1, rownum))split  
             from dual
             connect by level <= length(regexp_replace(list, pattern))  + 1;
        loop
            fetch c into j, f;
            exit when c%notfound;
            dbms_output.put_line('Column ' || i || ': ' || replace(f, '"'));
            i:=i+1;
        end loop;
        close c;

end;

i am getting below output but i need expected output.

Raw list: apple|bat|cat|"dog|dog"||||eee||abc
Column 0: apple
Column 1: bat
Column 2: cat
Column 3: dog
Column 4: dog
Column 5: eee
Column 6: abc
Column 7:
Column 8:
Column 9:
Column 10:

Luke Woodward
  • 63,336
  • 16
  • 89
  • 104
Devendra S
  • 11
  • 1
  • 1
  • 4

1 Answers1

0

I wouldn't use regular expressions for this.

Instead, I'd use ordinary string-manipulation functions to break the string apart. For example:

declare
    list varchar2(3500) := 'apple|bat|cat|"dog | dog"||||eee||abc';

    next_pipe_pos       integer;
    close_quote_pos     integer;
    column_start_pos    integer;
    column_num          integer;
    column_text         varchar2(4000);
begin
    column_start_pos := 1;
    column_num := 1;

    -- Appending a | character allows us to assume that all columns have a
    -- pipe following them and avoids any special-case handling for the last
    -- column.
    list := list || '|';

    while column_start_pos <= length(list)
    loop
        if substr(list, column_start_pos, 1) = '"' then
            close_quote_pos := instr(list, '"|', column_start_pos + 1);
            if close_quote_pos = 0 then
                -- Mismatched quotes.
                raise no_data_found;
            end if;

            column_text := substr(list, column_start_pos + 1, close_quote_pos - column_start_pos - 1);
            column_start_pos := close_quote_pos + 2;
        else
            next_pipe_pos := instr(list, '|', column_start_pos);
            exit when next_pipe_pos = 0;
            column_text := substr(list, column_start_pos, next_pipe_pos - column_start_pos);
            column_start_pos := next_pipe_pos + 1;
        end if;

        dbms_output.put_line('Column ' || column_num || ': ' || column_text);

        column_num := column_num + 1;        
    end loop;
end;

It's more code, but it's arguably less cryptic than the regexp you were using.

Output from running this:

Column 1: apple
Column 2: bat
Column 3: cat
Column 4: dog | dog
Column 5:
Column 6:
Column 7:
Column 8: eee
Column 9:
Column 10: abc
Luke Woodward
  • 63,336
  • 16
  • 89
  • 104