4

In my project i use oracle as primary database and i've faced a problem with parsing clob. So suppose we have a clob with value

   aaaaaa
   cccccc
   bbbbbb

And it's stored in table test ...

I need to write plsql procedure to get this clob and split it so that i will have array with three items [aaaaaa,cccccccc,bbbbbbb].

Is there any possible solutions?

Andrew
  • 26,629
  • 5
  • 63
  • 86
ilya.stmn
  • 1,604
  • 5
  • 23
  • 41
  • What have you tried? Have you looked at the [DBMS_LOB](http://docs.oracle.com/cd/E14072_01/appdev.112/e10577/d_lob.htm) package? If you can do this to a `VARCHAR2` string, you can adapt that approach with the related LOB operations (e..g INSTR, SUBSTR). – Alex Poole Jul 25 '12 at 11:04
  • 2
    Have a look at the Oracle documentation for the [DBMS_LOB package](http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_lob.htm#i1015792). Go to it, and good luck. – Bob Jarvis - Слава Україні Jul 25 '12 at 11:13
  • i have tried this code `FOR i IN (SELECT * from test ) LOOP LOOP EXIT WHEN nEndIndex<1; nEndIndex := INSTR(i.value, CHR(10),nStartIndex); vLine := SUBSTR(i.incass, nStartIndex, nEndIndex); DBMS_OUTPUT.put_line(vLine); nStartIndex := nEndIndex + 1; nLineIndex:=nLineIndex+1; END LOOP; END LOOP; ` – ilya.stmn Jul 26 '12 at 02:43
  • 3
    And... what happened? Do you get an error? The wrong results? (Please post code and results as edits to the question, not as comments). I suggest you work out the logic with a `VARCHAR2` column, then look at the `DBMS_LOB` equivalents to deal with a CLOB instead. This looks like it will almost work for `VARCHAR2`, if `nStartIndex` is initialised to `1` and `nEndIndex` is `0` or higher, but the last parm of `substr` is wrong, and it won't show the last line I think. – Alex Poole Jul 26 '12 at 11:58

9 Answers9

15

Here is a piece of code that works. I suggest that you use explicit cursors instead of implicit ones (FOR i IN (select...)), for performance purpose.

First here is the script to create testcase.

create table test (c clob);

insert into test (c) values (
'azertyuiop
qsdfghjklm
wxcvbn
');

Then here is the script to read line by line Clob :

/* Formatted on 28/08/2012 14:16:52 (QP5 v5.115.810.9015) */
declare
    nStartIndex number := 1;
    nEndIndex number := 1;
    nLineIndex number := 0;
    vLine varchar2(2000);

    cursor c_clob is
    select c from test;

    c clob;
    -------------------------------
    procedure printout
       (p_clob in out nocopy clob) is
      offset number := 1;
      amount number := 32767;
      len    number := dbms_lob.getlength(p_clob);
      lc_buffer varchar2(32767);
      i pls_integer := 1;
    begin
      if ( dbms_lob.isopen(p_clob) != 1 ) then
        dbms_lob.open(p_clob, 0);
      end if;
      amount := instr(p_clob, chr(10), offset);
      while ( offset < len )
      loop
        dbms_lob.read(p_clob, amount, offset, lc_buffer);
        dbms_output.put_line('Line #'||i||':'||lc_buffer);
       offset := offset + amount;
       i := i + 1;
      end loop; 
          if ( dbms_lob.isopen(p_clob) = 1 ) then
        dbms_lob.close(p_clob);
      end if; 
    exception
      when others then
         dbms_output.put_line('Error : '||sqlerrm);
    end printout;
    ---------------------------
begin
    dbms_output.put_line('-----------');
    open c_clob;
    loop
       fetch c_clob into c;
       exit when c_clob%notfound;
       printout(c);
    end loop;
    close c_clob;
end;

'amount' variable is used to detect end of line position. Be carfull, in some case the end of line is CHR(10)||CHR(13) (CR + LF), and in some other cases it is only CHR(10).

tonyf
  • 34,479
  • 49
  • 157
  • 246
Pierre-Gilles Levallois
  • 4,161
  • 4
  • 28
  • 37
  • I'm getting some very weird results with this if the first line is shorter than the other lines. Looking into it... Example test table: insert into test (c) values ( 'az'||chr(10)|| 'qsdfghjklm'||chr(10)|| 'wxcvbn'||chr(10)); – Wouter Aug 14 '15 at 14:01
  • I've found some more bugs in this code, and tried to fix it. Check out my answer. – Wouter Aug 14 '15 at 15:08
  • @Pierre-Gilles Levallois updated your code example to ensure that you were referencing the correct column (c) within your test table as part of your cursor c_clob. Your solution now works fine. – tonyf Apr 21 '17 at 14:50
11

While the SQL regexp / connect by level approach is probably the most elegant, it is quite bad performancewise (for my testcase on 11.2.0.3.0). Much faster is a simple parse like this.

procedure parse_clob(p_clob in clob) is
l_offset pls_integer:=1;
l_line varchar2(32767);
l_total_length pls_integer:=length(p_clob);
l_line_length pls_integer;
begin
  while l_offset<=l_total_length loop
    l_line_length:=instr(p_clob,chr(10),l_offset)-l_offset;
    if l_line_length<0 then
      l_line_length:=l_total_length+1-l_offset;
    end if;
    l_line:=substr(p_clob,l_offset,l_line_length);
    dbms_output.put_line(l_line); --do line processing
    l_offset:=l_offset+l_line_length+1;
  end loop;
end parse_clob;
Andy Haack
  • 429
  • 3
  • 6
4

In case... - you have APEX installed - and the clob is less than 32K you may also want to look into the following code:

declare
  l_text varchar2(32767) := '...';
  l_rows wwv_flow_global.vc_arr2;
begin
  l_rows := apex_util.string_to_table(l_text, chr(10));
  for i in 1 .. l_rows.count loop
    dbms_output.put_line(l_rows(i));
  end loop;
end;
/
2

A pipelined function with some additional options to drive the behavior. Tested/works on Windows, Oracle 11g (I have some suspicions it may fail in *nix environments because of how lines terminate).

CREATE OR REPLACE FUNCTION ETL_HELPER_PARSE
   (P_CLOB NCLOB, P_LINES_TO_SKIP INT DEFAULT 0, P_PUT_EMPTY_LINES CHAR DEFAULT 'N') RETURN SYS.ODCIVarchar2List PIPELINED
AS
  c_top_lines_to_skip  CONSTANT NUMBER  NOT NULL := P_LINES_TO_SKIP;
  c_output_empty_lines CONSTANT CHAR(1) NOT NULL := P_PUT_EMPTY_LINES; 
  --
  l_len     INT := DBMS_LOB.GETLENGTH(P_CLOB);
  l_hit     INT := 0;
  l_offset  INT := 1;
  l_amount  INT;  
  l_buffer  VARCHAR2(32767);
  l_cnt     INT := 1;  
BEGIN  
  WHILE ( l_offset < l_len )
  LOOP
    l_hit := DBMS_LOB.INSTR (
     lob_loc    => P_CLOB           -- IN   CLOB      CHARACTER SET ANY_CS
    ,pattern    => CHR(13)||CHR(10) -- IN   VARCHAR2  CHARACTER SET lob_loc%CHARSET
    ,offset     => l_offset         -- IN   INTEGER := 1
    ,nth        => 1                -- IN   INTEGER := 1
    );
    l_amount := CASE WHEN COALESCE(l_hit, 0) > 0 THEN l_hit - l_offset ELSE l_len - l_offset + 1 END;
    -- `l_amount=0` means a new empty line has been encountered
    IF l_cnt > c_top_lines_to_skip       
    THEN
      IF l_amount > 0
      THEN
        DBMS_LOB.READ(P_CLOB, l_amount, l_offset, l_buffer);
        PIPE ROW (l_buffer);
      ELSIF UPPER(c_output_empty_lines) = 'Y'
      THEN
        PIPE ROW ('');
      END IF;
    END IF;

    l_offset := CASE WHEN COALESCE(l_hit, 0) > 0  THEN l_hit + 2 ELSE l_len END;    
    l_cnt := l_cnt + 1;
  end loop;
EXCEPTION
  WHEN OTHERS THEN
     DBMS_OUTPUT.PUT_LINE('Error : '||SQLERRM);
END ETL_HELPER_PARSE;
RoadVampire
  • 89
  • 2
  • 3
1

sample for dynamicly length of rows

and alternative for UNIX and WIN files

and CR/LF on end of file or without it


Create table for TEST

drop table pbrev.test_SVT_tmp;
create table pbrev.test_SVT_tmp (xc clob);
insert into pbrev.test_SVT_tmp (xc) values (
--'azertyuiop;11' || chr(13) || chr(10) ||'qsdfghjklm;7878' || chr(13) || chr(10) ||'wxcvbn;0' || chr(13) || chr(10) );
'azertyuiop;11' || chr(13) || chr(10) ||'qsdfghjklm;7878' || chr(13) || chr(10) ||'wxcvbn;0' );
'azerty jhjh  huiop;11
qsdfgkj  hjklhhhhhhhhhhhm;7878
wxcvbn;0
dkjsk kjdsk5456 4654 5646 54645
FINISH'
);
delete from pbrev.test_SVT_tmp ;
select xc from pbrev.test_SVT_tmp;
--SET SERVEROUTPUT ON;
--SET SERVEROUTPUT OFF;
declare
    nStartIndex number := 1;
    nEndIndex number := 1;
    nLineIndex number := 0;
    vLine varchar2(2000);
    cursor c_clob is
    select xc from pbrev.test_SVT_tmp;
    c clob;
    procedure printout
       (p_clob in out nocopy clob) is
      offset number := 1;
      amount number := 32767;
      amount_last number := 0;
      len    number := dbms_lob.getlength(p_clob);
      lc_buffer varchar2(32767);
      line_seq pls_integer := 1;
      -- For UNIX type file - replace CHR(13) to NULL
      CR char := chr(13);
      --CR char := NULL;
      LF char := chr(10);      
      nCRLF number;
      sCRLF varchar2(2);
      b_finish boolean := true;
begin
      sCRLF := CR || LF;
      nCRLF := Length(sCRLF);
      if ( dbms_lob.isopen(p_clob) != 1 ) then
        dbms_lob.open(p_clob, 0);
      end if;
      amount := instr(p_clob, sCRLF, offset);
      while ( offset < len )
      loop
        -- For without CR/LF on end file
        If amount < 0 then
          amount := len - offset + 1;
          b_finish := false;
        End If;
        dbms_lob.read(p_clob, amount, offset, lc_buffer);
        If b_finish then
          lc_buffer := SUBSTR(lc_buffer,1,Length(lc_buffer)-1);  
        End If;
        if (line_seq-1) > 0 then
          amount_last := amount_last + amount;
          offset := offset + amount; 
        else
          amount_last := amount;
          offset := amount + nCRLF;
        end if;
        amount := instr(p_clob, sCRLF, offset);
        amount := amount - amount_last;
        dbms_output.put_line('Line #'||line_seq||': '||lc_buffer);
        line_seq := line_seq + 1;
      end loop; 
      if ( dbms_lob.isopen(p_clob) = 1 ) then
        dbms_lob.close(p_clob);
      end if; 
    exception
      when others then
         dbms_output.put_line('Error : '||sqlerrm);
    end printout;
begin
    open c_clob;
    loop
       fetch c_clob into c;
       exit when c_clob%notfound;
       printout(c);
    end loop;
    close c_clob;
end;
1

This is a follow-up answer based on @Pierre-Gilles Levallois his answer. Since I think it contained some bugs, I've tried to fix them.

  • Empty lines caused errors
  • Newlines where still in the print output (might be desired)
  • "amount :=" was not inside the while loop, i think this caused a big bug if any value was shorter than the first line in the CLOB

I've implemented this as rather quick and dirty fixes. I'm sure there should be more elegant solutions... Here goes. The example table:

create table test (c clob);

insert into test (c) values (
-- line 1 (empty)
chr(10)||'line 2'
||chr(10) -- line 3 (empty)
||chr(10)||'line 4'
||chr(10)||'line 5'
||chr(10)); -- line 6 (empty)

And the altered code:

set serveroutput on;
declare
    cursor c_clob is
    select c from test;

    c clob;
    -------------------------------
    procedure printout
       (p_clob in out nocopy clob) is
      offset number := 1;
      amount number := 32767;
      len    number := dbms_lob.getlength(p_clob);
      lc_buffer varchar2(32767);
      i pls_integer := 1;
    begin
      if ( dbms_lob.isopen(p_clob) != 1 ) then
        dbms_lob.open(p_clob, 0);
      end if;
      while ( offset < len )
          loop
            -- If no more newlines are found, read till end of CLOB
            if (instr(p_clob, chr(10), offset) = 0) then
                amount := len - offset + 1;
            else
                amount := instr(p_clob, chr(10), offset) - offset;
            end if;

            -- This is to catch empty lines, otherwise we get a NULL error
            if ( amount = 0 ) then
                lc_buffer := '';
            else
                dbms_lob.read(p_clob, amount, offset, lc_buffer);
            end if;
            dbms_output.put_line('Line #'||i||':'||lc_buffer);

            -- This is to catch a newline on the last line with 0 characters behind it
            i := i + 1;
            if (instr(p_clob, chr(10), offset) = len) then
                lc_buffer := '';
                dbms_output.put_line('Line #'||i||':'||lc_buffer);
            end if;

            offset := offset + amount + 1;
          end loop; 
     if ( dbms_lob.isopen(p_clob) = 1 ) then
        dbms_lob.close(p_clob);
      end if; 
    exception
      when others then
         dbms_output.put_line('Error : '||sqlerrm);
    end printout;
    ---------------------------
begin
    dbms_output.put_line('-----------');
    open c_clob;
    loop
       fetch c_clob into c;
       exit when c_clob%notfound;
       printout(c);
    end loop;
    close c_clob;
end;
Wouter
  • 1,829
  • 3
  • 28
  • 34
0

Emmanuel's Answer

This is an elegant solution that works just fine with clobs exceeding 32767 chars or lines exceeding 4K characters.

ANSI Standard Query:

DECLARE
  v_tmp clob :='aaaa'||chr(10)||
               'bbb'||chr(10)||
               'ccccc';
BEGIN
  FOR rec IN (WITH clob_table(c) as (SELECT v_tmp c FROM DUAL),
                   recurse(text,line) as (SELECT regexp_substr(c, '.+', 1, 1) text,1 line
                                            FROM clob_table
                                           UNION ALL
                                          SELECT regexp_substr(c, '.+', 1, line+1),line+1
                                            FROM recurse r,clob_table
                                           WHERE line<regexp_count(c, '.+'))
            SELECT text,line FROM recurse) LOOP
    dbms_output.put_line(rec.text);
  END LOOP;
END;

Oracle Specific Query (original post):

DECLARE
  v_tmp clob :='aaaa'||chr(10)||
               'bbb'||chr(10)||
               'ccccc';
BEGIN
  FOR rec IN (WITH clob_table(c) as (SELECT v_tmp c FROM DUAL)
            SELECT regexp_substr(c, '.+', 1, level) text,level line
             FROM clob_table
          CONNECT BY LEVEL <= regexp_count(c, '.+')) LOOP
      dbms_output.put_line(rec.text);
  END LOOP;
END;
Community
  • 1
  • 1
DKATyler
  • 914
  • 10
  • 16
0

I created a table called lixo_mq:

CREATE TABLE LIXO_MQ (CAMPO1 VARCHAR2(4000))

I copied printout procedure and changed it to work different:

PROCEDURE PRINTOUT (P_CLOB IN OUT NOCOPY CLOB) IS
   V_APARTIR                     NUMBER (20);
   V_CONTAR                      NUMBER (20);
   V_LINHA                       VARCHAR2 (4000);
   V_REG                         NUMBER (20);
   V_CORINGA                     VARCHAR2 (10) := CHR (10);
   V_ERRO                        VARCHAR2 (4000);
BEGIN
   IF (DBMS_LOB.ISOPEN (P_CLOB) != 1) THEN
      DBMS_LOB.OPEN (P_CLOB, 0);
   END IF;

   V_APARTIR                  := 1;
   V_REG                      := 1;

   WHILE DBMS_LOB.INSTR (LOB_LOC                       => P_CLOB
                        ,PATTERN                       => V_CORINGA
                        ,OFFSET                        => 1
                        ,NTH                           => V_REG
                        ) > 0
   LOOP
      V_CONTAR                   :=
                 DBMS_LOB.INSTR (LOB_LOC                       => P_CLOB
                                ,PATTERN                       => V_CORINGA
                                ,OFFSET                        => 1
                                ,NTH                           => V_REG
                                )
               - V_APARTIR;

      IF V_APARTIR > 1 THEN
         V_LINHA                    :=
                         DBMS_LOB.SUBSTR (LOB_LOC                       => P_CLOB
                                         ,AMOUNT                        =>   V_CONTAR
                                                                           - 1
                                         ,OFFSET                        =>   V_APARTIR
                                                                           + 1
                                         );
      ELSE
         V_LINHA                    :=
                                 DBMS_LOB.SUBSTR (LOB_LOC                       => P_CLOB
                                                 ,AMOUNT                        => V_CONTAR
                                                 ,OFFSET                        => V_APARTIR
                                                 );
      END IF;

      INSERT INTO LIXO_MQ
                  (CAMPO1
                  )
           VALUES (   V_REG
                   || ':'
                   || V_LINHA
                  );

      COMMIT;
      V_APARTIR                  :=
                           DBMS_LOB.INSTR (LOB_LOC                       => P_CLOB
                                          ,PATTERN                       => V_CORINGA
                                          ,OFFSET                        => 1
                                          ,NTH                           => V_REG
                                          );
      V_REG                      :=   V_REG
                                    + 1;
   END LOOP;

   IF (DBMS_LOB.ISOPEN (P_CLOB) = 1) THEN
      DBMS_LOB.CLOSE (P_CLOB);
   END IF;
EXCEPTION
   WHEN OTHERS THEN
      V_ERRO                     :=    'Error : '
                                    || SQLERRM;

      INSERT INTO LIXO_MQ
                  (CAMPO1
                  )
           VALUES (V_ERRO
                  );

      COMMIT;
END PRINTOUT;
-1
    declare
    c_clob clob := empty_clob();
    c_offset number;
    c_len number;
    read_cnt number;
    prev_buf number := 0;
    read_str varchar2(32000);
    BEGIN
   -- Read the clob in to the local variable
        select c into c_clob from test;
        c_offset := 1;
   -- Get the length of the clob
        c_len := dbms_lob.getlength(c_clob);
   -- Read till the current offset is less the length of clob
    while(c_offset <= c_len)
        loop
   -- Get the index of the next new line character
           read_cnt := instr(c_clob, CHR(10), c_offset, 1);
           exit when read_cnt = 0;
   -- Read the clob in the index
           read_str := dbms_lob.substr(c_clob, read_cnt-c_offset, c_offset);                                          
           dbms_output.put_line('Line#' || read_str);
   -- Now the current offset should point after the read line
           c_offset := read_cnt+1;
           end loop;
        END;
    /