File corruption with UTL_FILE script

Question

I have definitely searched FAR and WIDE for an answer to this, but I can't find anything! I am using a UTL_FILE script to pull down some file BLOBS from an Oracle table and save them to a file directory. It's working for a lot of the files, but I have narrowed it down by process of elimination that it's having issues for files that have an "unconventional" file name, albeit still a valid one, the files are getting corrupted in the transfer. They may have only been 30kb originally, but export as 5kb and cannot be opened. So I know it's not a large file size issue. The files open just fine through the application, have a valid MIME type encoding, and would otherwise open fine on a file system, but UTL_FILE doesn't seem to like them. They are files that have an extra "." in them ie: john.smith.doc, or a pound sign ie: Smith #12345.doc or parentheses, etc. I cannot change the source file names in the Oracle table, but I have been concatenating an ID number on to them when saving them out so I can reference it as a key for an ETL load into SQL file table later. Maybe I also need to write a complicated REGEXP to rename the files on the fly and strip out the bad characters, but I'm not sure that will work because I don't know at what point UTL_FILE is choking on them. If it's at the source, then that won't help. Has anyone else encountered this problem? Here is my script:

DECLARE

CURSOR C1 IS Select FILE_ID || '---' || substr(DOCUMENTLOCATION,1,instr 
(DOCUMENTLOCATION,'.')-1)||'.doc' as FILE_NAME, FILE_BLOB, FILE_ID
From DOCUMENTS d inner join CASEJOURNAL c on d.FILE_ID = c.JOURNALENTRYID 
where (JOURNAL_ENTRY_TYPE = 117 or JOURNAL_ENTRY_TYPE = 3) AND 
c.DOCUMENTLOCATION Is Not Null AND d.MIME_TYPE = 'application/msword' 
AND FILE_ID BETWEEN 785 AND 3380;

  l_file      UTL_FILE.FILE_TYPE;
  l_buffer    RAW(32000);
  l_amount    INTEGER := 32000;
  l_pos       INTEGER := 1;
  l_blob      BLOB;
  l_blob_len  INTEGER;
  l_filename  varchar2(255); 

BEGIN
--Select BLOB file into variables
FOR I in C1 
LOOP
Select FILE_ID || '---' || substr(DOCUMENTLOCATION,1,instr
(DOCUMENTLOCATION,'.')-1) ||'.doc' as FILE_NAME, FILE_BLOB INTO l_filename, 
l_blob From DOCUMENTS d inner join CASEJOURNAL c on d.FILE_ID = 
c.JOURNALENTRYID   where (JOURNAL_ENTRY_TYPE = 117 or JOURNAL_ENTRY_TYPE =  
3) AND c.DOCUMENTLOCATION Is Not Null AND d.MIME_TYPE 
= 'application/msword' and d.FILE_ID = I.FILE_ID;

-- Define the output directory
l_file := UTL_FILE.FOPEN('\\myfiledirectory',l_filename,'wb',32000);
l_pos := 1;
l_amount := 32000;

--Get length of BLOB file and save to variable.
l_blob_len := DBMS_LOB.getlength(l_blob);

-- Write the data to the file
--If small enough for single write:
    IF l_blob_len < 32000 THEN
        UTL_FILE.PUT_RAW (l_file, l_blob);
        UTL_FILE.FFLUSH(l_file);
        --Write in pieces if larger than 32k
        ELSE
        l_pos := 1;
        WHILE l_pos < l_blob_len AND  l_amount > 0
        LOOP
        DBMS_LOB.read(l_blob, l_amount, l_pos, l_buffer);
        UTL_FILE.PUT_RAW(l_file, l_buffer);
        UTL_FILE.FFLUSH(l_file);
        --Set start position for next write
        l_pos := l_pos + l_amount;
        --Set end position if less than 32k.
        l_blob_len := l_blob_len - l_amount;
            IF l_blob_len < 32000 THEN
            l_amount := l_blob_len;
            END IF;
        END LOOP;
  END IF;
UTL_FILE.FCLOSE(l_file);
END LOOP;
END;

Alex Poole · Answer 1 · 2015-01-22T14:40:31.787

The file name isn't going to affect how the bytes are written out once the file has been opened. You seem to be truncating the file if it's more than 32k. Your loop does this:

    WHILE l_pos < l_blob_len AND  l_amount > 0
    LOOP

... but then you change both l_pos and l_blob_len within the loop; once the adjusted l_pos falls below the remaining l_blob_len you exit the loop, too early. You don't need to adjust l_blob_len, or even adjust l_amount - that is the maximum number of bytes to read, it doesn't matter if it's higher than what's left.

So change the loop to:

    WHILE l_pos < l_blob_len AND  l_amount > 0
    LOOP
      DBMS_LOB.read(l_blob, l_amount, l_pos, l_buffer);
      UTL_FILE.PUT_RAW(l_file, l_buffer);
      UTL_FILE.FFLUSH(l_file);
      --Set start position for next write
      l_pos := l_pos + l_amount;
    END LOOP;

Not really related to your problem, but you don't need to reselect the data inside your cursor loop. You've already got the values you need in your i cursor variable, so you can do:

FOR I in C1 
LOOP
l_filename := i.file_name;
l_blob := i.file_blob;

-- Define the output directory
...

Or don't bother with the l_filename and l_blob local variables at all; since you only refer to them inside the cursor loop anyway, use i.file_name and i.file_blob directly everywhere, e.g.

l_file := UTL_FILE.FOPEN('\\myfiledirectory',i.file_name,'wb',32000);
l_blob_len := DBMS_LOB.getlength(i.file_blob);

etc.

UPDATE--I found that the files that were being corrupted upon extract had nothing to do with my script, nor with the naming convention they had in the DB. It was because the front end application was compressing some of the files upon upload to the DB. There was no way to tell which were compressed and which weren't (no sort of flag in the DB). However, when I applied UTL_FILE_COMPRESS.lz_uncompress() to the files that were giving me problems, they opened perfectly--obviously. — kharvey, Apr 06 '15 at 19:08

File corruption with UTL_FILE script

1 Answers1