1

ORACLE database that I use stores files in the PDF or ZIP format in the BLOB type. I want to save these files. However, I do not know how to recognize when it is a PDF and when it is ZIP? Is it possible to check which file format BLOB stores inside?

Below is a simple write_file method for saving a file:

def write_file(data, filename):
    with open(filename, 'wb') as f:
        f.write(data)

Here, I fetch the appropriate BLOB with the cursor and I use the write_file method to save the file:

firstRow = cur.fetchone()
write_file(firstRow[0].read(), "blah.zip")

How to recognize when it will be zip and when it will be pdf?

Mozgawa
  • 115
  • 9

1 Answers1

4

You can try to check the file signatures by inspecting the bytes you read.

According to this: https://en.wikipedia.org/wiki/List_of_file_signatures

1) A zip file starts with "50 4B 03 04" or "50 4B 05 06" or "50 4B 07 08"

2) A pdf file starts with: "25 50 44 46 2d"

So you can check the first few bytes and check if those are equal to the file signatures - and figure out the file type based on that.

  • I inspected it, and when it is zip then it starts with 'PK' but it's in ISO 8859-1, and indeed in HEX singature its "50 4B 03 04" or "50 4B 05 06" or "50 4B 07 08". That's what I meant. You solved my problem. Thank you. – Mozgawa Oct 10 '19 at 15:22