1

I am getting the contents of my excel file in base64. I want to decode the file and load it in with xlrd in order to extract some information from it.

What I have tried:

import base64
from xlrd import open_workbook

encoded = get_file_content() # retrieving the contents of xls in base64 form
decoded = base64.b64decode(encoded)
type(decoded) #=> <class 'bytes'>

xlsfile = open('file.xls', 'wb')
xlsfile.write(decoded)
xlsfile.close()
workbook = open_workbook('file.xls')

I get the following error:

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'F\x8a-\x12{k\xc9j'
Jan Parzydło
  • 367
  • 4
  • 15
  • The [`xlrd.open_workbook`](https://xlrd.readthedocs.io/en/latest/api.html#xlrd.open_workbook) method expects a `filename` as its first argument. You have passed it the raw contents of what may or may not represent the file. Please consider writing out the contents assigned to `decoded` to a file with a filename, and then pass that same filename to `open_workbook`. The error you are seeing is due to attempting to call `open` with a null byte, which a binary file like Excel has at least one of. – metatoaster Feb 11 '20 at 12:28
  • right, I've edited the question accordingly, but there is still a problem with loading in the workbook. thanks – Jan Parzydło Feb 11 '20 at 12:41
  • Unfortunately, without access to the file (which undoubtedly is private to you) it would be difficult for anyone else to ascertain exactly what may have gone wrong. You may wish to try opening that resulting file using a program that can read it, and see if that works. – metatoaster Feb 11 '20 at 12:48
  • Related: [Error: Unsupported format, or corrupt file: Expected BOF record](https://stackoverflow.com/questions/16504975/error-unsupported-format-or-corrupt-file-expected-bof-record) – metatoaster Feb 11 '20 at 12:57
  • I have read the thread, but it didn't help much. The file is an unprotected XLS spreadsheet and I am not running excel at this particular moment. I put the encoded string into https://base64.guru/converter/decode/file and it did decode it succesfully. I am very confused. – Jan Parzydło Feb 11 '20 at 13:28
  • Without the ability to verify the state of the file using Excel (or an alternative, such as LibreOffice) severely hampers your ability to solve this issue you have. Once you have access to Excel to open your file you may either find that it doesn't work (indicates a corruption with your file) or that it opens correctly (which may indicate that `xlrd` cannot open that file, or that there are additional arguments which may need to be passed as per the documentation that you may need to try). – metatoaster Feb 11 '20 at 13:31
  • By 'not running excel' I meant that it's not running with the file open in the background while trying to load the same file with xlrd (one of the replies in the linked thread suggested that this may be the cause of the problem). I do have Excel and after decoding the file (using the website given above) the file succesfully opened in the application. – Jan Parzydło Feb 11 '20 at 13:34
  • Then perhaps it might be that `xlrd` does not support the particular format that you have, or you may wish to try debugging with that library yourself using `xlrd.dump`, or perhaps file an issue with the developers of this package. – metatoaster Feb 11 '20 at 13:36
  • Or perhaps you may need to try to save it as a `.xlsx` file instead of `.xls`, I don't know how `xlrd` determines how that file may be opened. – metatoaster Feb 11 '20 at 13:37
  • I think the problem lies within decoding the content of my file rather than xlrd of the file itself. How can I compare the output (bytes) of an online decoder with that of the ```b64decode``` function? – Jan Parzydło Feb 11 '20 at 13:54
  • 1
    It could also have something to do with how the original file was encoded, too. You may wish to download the result from the base64.guru decoder as instructed, and compare the checksum (e.g. md5sum) and see if they match, and failing that, compare against the hexdump of the file. You really should compare using the original file before it was encoded with base64. – metatoaster Feb 11 '20 at 13:59
  • 1
    After comparing hexdumps I realised that the files I'm receiving are not at all base64 encoded and so the b64decode function was returning incorrect values. Case closed. – Jan Parzydło Feb 11 '20 at 14:49

0 Answers0