I'm using xlrd
to process .xls files, and openpyxl
to process .xlsx files, and this is working well.
Then I'm handed what is ostensibly a .xls file, so I try to xlrd.open_workbook()
, and get:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'
I take a look at this question, and I surmise that my file, although ending with extension .xls, must actually be a .xlsx. And indeed, I can view it in a text editor:
<?xml version="1.0" encoding="UTF-8"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
:
:
:
(for privacy reasons, I can't post the whole file, but it's probably not required for our analysis).
So I surmise that if I just copy (cp
) it to a .xlsx, I should be able to open it with openpyxl.load_workbook()
, but I get:
BadZipfile: File is not a zip file
If it's actually an xls (unlikely) but can't be opened with xlrd
, and if it is atcually an xlsx but can't be opened with openpyxl
, even after I cp
it to a .xlsx, what to do?
Note: If I open up the .xls in Excel, save it as a .xlsx, and retry with openpyxl
, it does load fine, but this manual step is not a luxury I will have in the executing of my program.