1

We have been running a stored procedure to extract BLOBS from a database every day for roughly the last 5 years. We generally extract around 25 files each night, with the majority being approximately 500KB in size and a handful nearing 10,000KB.

The process has never been the fastest but after we moved data centres this process can take upwards of 12 hours. That in itself is shocking when you are only extracting ~55MB or so. We've engaged all the relevant teams to take a look at the performance of Oracle, disk I/O etc and they claim that everything is perfect.

I've been reading about UTL_FILE and DBMS_LOB.read, seen people talk about resetting the pos after each loop etc. To be honest I can't seem to figure out what any of this means, and the general consensus is that there are far better ways of achieving the same outcome.

Unfortunately we do not have the liberty of refactoring this, so can anyone see anything glaringly wrong with our procedure? I'm just struggling with something I do not fully understand and those who maintain our infrastructure blame it all on this code and wash their hands of it.

    CREATE OR REPLACE PROCEDURE PKG_EXTRACT (l_brand IN VARCHAR2) AS

   l_file      UTL_FILE.FILE_TYPE;
   l_buffer    RAW(32767);
   l_amount    BINARY_INTEGER := 32767;
   l_pos       INTEGER;
   l_blob      BLOB;
   l_blob_len  INTEGER;
   x  NUMBER;
   l_file_name VARCHAR2(200);
   l_count       INTEGER := 1;
   v_code  NUMBER;
   v_errm  VARCHAR2(64);
 log_file      UTL_FILE.FILE_TYPE;
rec_num number;

BEGIN
DECLARE
   CURSOR extract_cur IS
      SELECT DATA, BIN_NAME
      FROM   STAGING
      WHERE UPPER(EXTRACTED)='N';

BEGIN
log_file := UTL_FILE.fopen('DATA_DOWNLOAD_DIR','pkg_extract.log','a', 32767);

UTL_FILE.put_line(log_file,'Logging is being done in 24 hours format - V1.5 ',TRUE);
 UTL_FILE.put_line(log_file,'Extract procedure started on Date-Time = '|| TO_TIMESTAMP(LOCALTIMESTAMP, 'DD-MON-RR,HH24.MI.SSXFF'),TRUE);


select count(1) into rec_num from staging;

UTL_FILE.put_line(log_file,'Total Number of records found = ' || rec_num , TRUE);
select count(1) into rec_num from staging where UPPER(EXTRACTED)='N';
UTL_FILE.put_line(log_file,'Total Number of records matching criteria = ' || rec_num , TRUE);

   dbms_output.put_line('Loop through records and write them to file');
   FOR extract_rec IN extract_cur
   LOOP

      l_pos := 1;
      l_blob := extract_rec.DATA;
      l_blob_len := DBMS_LOB.getlength(l_blob);

      -- Save blob length.
      x := l_blob_len;

      l_file_name := extract_rec.BIN_NAME ;

      -- Open the destination file.
      dbms_output.put_line('Open the destination file:- ' || l_file_name);
      l_file := UTL_FILE.fopen('DATA_DOWNLOAD_DIR',l_file_name,'wb', 32767);
      dbms_output.put_line('File opened');

      -- Read chunks of the BLOB and write them to the file until complete.
      dbms_output.put_line('l_pos:- ' || l_pos);
      dbms_output.put_line('l_blob_len:- ' || l_blob_len);
      WHILE l_pos <= l_blob_len
      LOOP
         dbms_output.put_line('DBMS_LOB.read from position: ' || l_pos);
         DBMS_LOB.read(l_blob, l_amount, l_pos, l_buffer);
         dbms_output.put_line('UTL_FILE.put_raw');
         UTL_FILE.put_raw(l_file, l_buffer, TRUE);
         dbms_output.put_line('Written ' || l_amount || ' bytes of data starting at position: ' || l_pos);

         -- Set the start position for the next cut.
         l_pos := l_pos + l_amount;

    --updating the extract field

        dbms_output.put_line(extract_rec.BIN_NAME);

      END LOOP;

      l_count := l_count + 1;
      -- Close the file.
      dbms_output.put_line('Close the file:- ' || l_file_name);
      UTL_FILE.fclose(l_file);

      update staging set extracted='Y', extract_timestamp=sysdate where bin_name=extract_rec.BIN_NAME;
      commit;

   END LOOP;

 UTL_FILE.put_line(log_file,'Extract procedure Completed on Date-Time = '|| TO_TIMESTAMP(LOCALTIMESTAMP, 'DD-MON-RR,HH24.MI.SSXFF'),TRUE);

 IF UTL_FILE.is_open(log_file) THEN
      UTL_FILE.fclose(log_file);
end if;
END;

EXCEPTION
   WHEN OTHERS THEN
   v_code := SQLCODE;
   v_errm := SUBSTR(SQLERRM, 1, 64);
   dbms_output.put_line('Error code ' || v_code || ': ' || v_errm);
UTL_FILE.put_line(log_file,'--------------------------------------' ,TRUE);
 UTL_FILE.put_line(log_file,'Error Occurred while executing '||'Error code ' || v_code || ': ' || v_errm ,TRUE);
 UTL_FILE.put_line(log_file,'Extract procedure Completed with errors - '|| TO_TIMESTAMP(LOCALTIMESTAMP, 'DD-MON-RR,HH24.MI.SSXFF'),TRUE);
UTL_FILE.put_line(log_file,'--------------------------------------' ,TRUE);

   -- Close the file if something goes wrong.
   IF UTL_FILE.is_open(l_file) THEN

      UTL_FILE.fclose(l_file);

 IF UTL_FILE.is_open(log_file) THEN
      UTL_FILE.fclose(log_file);
end if;
   END IF;
   RAISE;
END;
/

EDIT

Execution Plan for the CURSOR extract_cur.

Plan hash value: 3428151562
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     6 |   678 |     9   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| STAGING |     6 |   678 |     9   (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("S"."EXTRACTED")='N')
hshah
  • 842
  • 4
  • 14
  • 35
  • Did you try gathering stats on the `staging` table? – Gurwinder Singh Dec 07 '16 at 04:22
  • @GurwinderSingh that is generally done on the database every month. It isn't a particularly large database either... 25 odd records per day and we only keep 30 days worth. – hshah Dec 07 '16 at 05:24
  • 1
    can you get the execution plan for the cursor's query and paste in the question? – Gurwinder Singh Dec 07 '16 at 05:34
  • @GurwinderSingh, took me some time to actually do it because I had to figure out how, but finally managed it. Last night's run of this started at 23:00 (just over 8 hours ago) and there were 23 BLOBS to fetch. It did the first 21 in 45 minutes (total of 35MB), then number 22 took 30 mins (only 600KB in size) and now the last one which is around 15MB has been going for just under 7 hours now and it will probably take a few more hours as its only done 10MB. It just gets slower and slower as it goes through them. – hshah Dec 07 '16 at 07:09
  • now this eats a lot of time 'code' SELECT DATA, BIN_NAME FROM STAGING WHERE UPPER(EXTRACTED)='N' 'code' – TheName Dec 07 '16 at 07:26
  • @TheName, does it? It usually takes less than 1 second to execute... – hshah Dec 07 '16 at 07:30
  • 1
    I can suggest not to use UPPER in the where clause, you can update the existing records to UPPER and add a trigger on STAGING table which checks when the inserting then extracted = upper(:new.extracted) try this before moving on and check the performance after not using the UPPER. – TheName Dec 07 '16 at 07:34
  • 1 second is a lot of time :) – TheName Dec 07 '16 at 07:35
  • 1
    also remove all dbms_output.put_line because it also eats a lot of memmory – TheName Dec 07 '16 at 07:37
  • @TheName - the N's are already all in capitals in the database, so that part of the query can easily be changed to remove the UPPER part, as well as removing the logging comments. However, one thing I do not understand is that the query runs, gets what it needs to process and then it is the extract itself which slows down over time. What difference would improving the query do to the extract. I apologise in advance if I am missing something obvious here. – hshah Dec 12 '16 at 14:42

1 Answers1

0

ok so i changed your code, try to avoid these things select count(1) into rec_num from staging because what oracle does here it converts pl/sql to sql and sql to pl/sql so better use cursor, here is the changed code. And try adding index create index extracted_idx on staging(extracted asc); so here is the modified code: I do not know if it's compiles because I can not check it, but it should work.

CREATE OR REPLACE PROCEDURE PKG_EXTRACT(l_brand IN VARCHAR2) AS

  l_file      UTL_FILE.FILE_TYPE;
  l_buffer    RAW(32767);
  l_amount    BINARY_INTEGER := 32767;
  l_pos       INTEGER;
  l_blob      BLOB;
  l_blob_len  INTEGER;
  x           NUMBER;
  l_file_name VARCHAR2(200);
  l_count     INTEGER := 1;
  v_code      NUMBER;
  v_errm      VARCHAR2(64);
  log_file    UTL_FILE.FILE_TYPE;
  rec_num     number;

  CURSOR extract_cur IS
    SELECT DATA, 
               BIN_NAME 
            FROM STAGING 
         WHERE EXTRACTED = 'N';

  cursor c_count is
    select count(1) cnt 
          from staging;

  cursor c_rec_num is
    select count(1)  cnt
         from staging 
        where EXTRACTED = 'N';

BEGIN
  log_file := UTL_FILE.fopen('DATA_DOWNLOAD_DIR',
                             'pkg_extract.log',
                             'a',
                             32767);

  UTL_FILE.put_line(log_file,
                    'Logging is being done in 24 hours format - V1.5 ',
                    TRUE);
  UTL_FILE.put_line(log_file,
                    'Extract procedure started on Date-Time = ' ||
                    TO_TIMESTAMP(LOCALTIMESTAMP, 'DD-MON-RR,HH24.MI.SSXFF'),
                    TRUE);

  open c_cnt;
  fetch c_cnt into rec_num;
  close c_cnt;

  open c_rec_num;
  fetch c_rec_num into rec_num;
  close c_rec_num;

  UTL_FILE.put_line(log_file,
                    'Total Number of records found = ' || rec_num,
                    TRUE);
  --dont know why you doing this again
  open c_rec_num;
  fetch c_rec_num into rec_num;
  close c_rec_num;

  UTL_FILE.put_line(log_file,
                    'Total Number of records matching criteria = ' ||
                    rec_num,
                    TRUE);

  FOR extract_rec IN extract_cur LOOP

    l_pos      := 1;
    l_blob     := extract_rec.DATA;
    l_blob_len := DBMS_LOB.getlength(l_blob);

    -- Save blob length.
    x := l_blob_len;

    l_file_name := extract_rec.BIN_NAME;

    -- Open the destination file.

    l_file := UTL_FILE.fopen('DATA_DOWNLOAD_DIR', l_file_name, 'wb', 32767);

    -- Read chunks of the BLOB and write them to the file until complete.

    WHILE l_pos <= l_blob_len LOOP

      DBMS_LOB.read(l_blob, l_amount, l_pos, l_buffer);

      UTL_FILE.put_raw(l_file, l_buffer, TRUE);

      -- Set the start position for the next cut.
      l_pos := l_pos + l_amount;

    --updating the extract field

    END LOOP;

    l_count := l_count + 1;
    -- Close the file.
    UTL_FILE.fclose(l_file);

    update staging
       set extracted = 'Y', 
                 extract_timestamp = sysdate
     where bin_name = extract_rec.BIN_NAME;
    commit;

  END LOOP;

  UTL_FILE.put_line(log_file,
                    'Extract procedure Completed on Date-Time = ' ||
                    TO_TIMESTAMP(LOCALTIMESTAMP, 'DD-MON-RR,HH24.MI.SSXFF'),
                    TRUE);

  IF UTL_FILE.is_open(log_file) THEN
    UTL_FILE.fclose(log_file);
  end if;
END;

EXCEPTION
WHEN OTHERS THEN 
        v_code := SQLCODE; v_errm := SUBSTR(SQLERRM, 1, 64); 
        UTL_FILE.put_line(log_file, '--------------------------------------', TRUE); 
        UTL_FILE.put_line(log_file, 'Error Occurred while executing ' || 'Error code ' || v_code || ': ' || v_errm, TRUE);
        UTL_FILE.put_line(log_file, 'Extract procedure Completed with errors - ' || TO_TIMESTAMP(LOCALTIMESTAMP, 'DD-MON-RR,HH24.MI.SSXFF'), TRUE); 
        UTL_FILE.put_line(log_file, '--------------------------------------', TRUE);

  -- Close the file if something goes wrong.
    IF UTL_FILE.is_open(l_file) THEN

    UTL_FILE.fclose(l_file);

        IF UTL_FILE.is_open(log_file) THEN 
            UTL_FILE.fclose(log_file);
        end if;
    END IF; 
END;
/
TheName
  • 697
  • 1
  • 7
  • 18
  • I've just been going through this to try and understand the changes. This isn't my strong point but I think I am getting there... after all one of the objectives of asking for help on here is to learn. I have a question about the code around the comment "dont know why you doing this again". It doesn't look quite right to me because the original gets a total row count and assigns it to rec_num before outputting it, then assigns the count of "N" to the same variable and outputs that. Yours is opening "c_cnt" which I think should be "c_count" as that's what is defined at the top. – hshah Dec 14 '16 at 17:52
  • I'm definitely going to have to rename "cnt" because I am only seeing one word every time I read it ;) – hshah Dec 14 '16 at 17:56
  • Did you manage to have a look at this? Am I on the right lines? – hshah Jan 12 '17 at 16:17