1

I have a CSV file with 2,000,000 records inserted into a table (blob column). I tried to create the following block:

 DECLARE
        v_doc          BLOB;
        v_file_content CLOB;
        zip_files      as_zip.file_list;
 BEGIN
    --Get file corresponding to process in progress
        SELECT
            file_content
        INTO v_doc
        FROM
            int_dat_journals_cv027_files
        WHERE
                file_name = 'in_gl_journals_20230322101527_v3.zip' -->This value is dynamic
            AND id_proceso_oic = '1'; -->This value is dynamic

        --Get Files from .zip file
        zip_files := as_zip.get_file_list(v_doc);
        
        --Iterate each file from .zip file
        FOR i IN zip_files.first()..zip_files.last LOOP
            --convert blob file to clob
            SELECT
                blob_to_clob_fn(as_zip.get_file(v_doc, zip_files(i)))
            INTO v_file_content
            FROM
                dual;
            
            --dbms_output.put_line(v_file_content);
            --iterate and split line by line by chr(10), for example:'a,b,c,,,f,g'
            FOR linea IN (
                SELECT
                    TRIM(column_value) l
                FROM
                    dual,
                    xmltable ( ( '"'
                                 || replace(v_file_content, CHR(10), '","')
                                 || '"' ) )
            ) LOOP
                dbms_output.put_line(linea.l);
            END LOOP;

        END LOOP;

    EXCEPTION
        WHEN OTHERS THEN
            dbms_output.put_line('ERROR: ' || sqlerrm);
            dbms_output.put_line(dbms_utility.format_error_backtrace);
    END;

I need to specifically optimize this section to be able to process millions of records:

FOR line IN ( select TRIM(column_value) l DESDE dual, xmltable ( ( '"' || replace(v_file_content, CHR(10), '","') || '"' ) ) ) LOOP dbms_output.put_line(line.l); END LOOP;

How can it be optimized?

Cesar Tepetla
  • 39
  • 1
  • 8
  • 4
    Besides row-by-row processing in general (never fast), using `DBMS_OUTPUT` and outputting the data to a console/terminal display is likely the slowest thing in your code. The bottleneck won't be in your code, but rather in the terminal's ability to display the data as fast as it is returned. It could easily still be scrolling from its buffer long after the database has selected the actual data. What is the purpose of returning all that data to the terminal? – pmdba Mar 22 '23 at 23:43
  • 2
    ... that is also assuming that the `DBMS_OUTPUT` buffer can handle all your data. it wasn't designed with this use in mind. it's more intended for displaying small error or troubleshooting messages, not for processing actual data. – pmdba Mar 22 '23 at 23:50
  • 1
    Several answers here: [Reading clob line by line with pl\sql](https://stackoverflow.com/questions/11647041/reading-clob-line-by-line-with-pl-sql). – Ponder Stibbons Mar 22 '23 at 23:52
  • Hi @pmdba, The use of DBMS_OUTPUT is only temporary. Actually, my block performs field-by-field segmentation on each line obtained in the iteration, then makes an insert into a DBMS_OUTPUT table. This is not the problem. The problem is how to optimize the for each part with a select statement – Cesar Tepetla Mar 23 '23 at 00:06
  • 2
    There's no actual processing of the data, because `loop` body contains `dbms_output` only and, as said earlier, the bottleneck is in the terminal. Please describe what you want to do with the file contents. Possibly, it would be better to use Oracle Loader to ingest your data or external table and just save the file in some local folder instead of quite slow row-by-row processing. – astentx Mar 23 '23 at 07:31

1 Answers1

1

You don't have to do that. You can try to use the csv as external table. Ofcorse you have to copy the data to another table if you would like to index it, or use it efficiently.

if your file's location is here csv_dir/yourfile.csv

drop table csv_test_table;

CREATE TABLE csv_test_table (
  COLUMN1 varchar2(255),
  COLUMN2 varchar2(255),
  COLUMN3 varchar2(255),
  COLUMN4 varchar2(255)

)
ORGANIZATION EXTERNAL (
  DEFAULT DIRECTORY csv_dir
  ACCESS PARAMETERS (
    records delimited BY newline
    skip 1
    fields 
    terminated BY ';' optionally enclosed BY '"' --or use ',' here
    lrtrim
    missing field VALUES are NULL
  )
  LOCATION ('yourfile.csv')  
);
babicsp
  • 33
  • 4