Why some .xlsx files are successfully open with openpyxl, but others cannot be open?

Question

I am using Python 2.7 & openpyxl==2.5.11.

I want to read .xlsx files using openpyxl. Initially I was testing with files downloaded from Google Drive and everything worked nice.

Now, I tried to load some files generated with Microsoft Excel, but I see this error: raise IOError("File contains no valid workbook part")

I tried to print some variables and figure out on my own, but I lack deeper knowledge about Excel files and there are some levels of abstraction I couldn't quickly understand.

Here is the relevant piece of code where the error is raised (excel.py):

def _find_workbook_part(package):
    workbook_types = [XLTM, XLTX, XLSM, XLSX]
    for ct in workbook_types:
        part = package.find(ct)
        if part:
            return part

    # some applications reassign the default for application/xml
    defaults = set((p.ContentType for p in package.Default))
    workbook_type = defaults & set(workbook_types)
    if workbook_type:
        return Override("/" + ARC_WORKBOOK, workbook_type.pop())

    raise IOError("File contains no valid workbook part")

I have the problem on both OSX & Ubuntu, if that's relevant.

EDIT: I am not capable to reproduce the issue with files I have generated on my own. I think the problem can only be reproduced with older files. The people who had the issue were using Excel 2008 or older version to create the files, so maybe that's the problem?

Thanks in advance

Is the extension of the file that causes it to fail .xls or .xlsx? I would try opening and re-saving the problematic file in Excel and then loading it again, to see if it might be a strange/subtle incompatibility issue. — cody, Nov 22 '18 at 01:28
The exception is pretty clear: the files are probably not valid XLSX files. I suggest you submit a bug report with a sample file. — Charlie Clark, Nov 22 '18 at 10:18
@stovfl thanks for the feedback, I almost forgot we have a layer of abstraction which allows us to open `xls` files. So, it comes down to opening `xlsx` files :) — giliev, Nov 22 '18 at 11:25
So, [edit] your Question and remove any `xls` reference to get your title right. — stovfl, Nov 22 '18 at 11:32

score 0 · Answer 1 · answered Nov 05 '22 at 15:54

0

this worked for me

df = pd.read_excel("Set.xlsb", sheet_name='Dataset', engine='pyxlsb')

Credits to the answer here: Using Pandas with XLSB File

answered Nov 05 '22 at 15:54

Reema

1
1

The question is about opening an `.xlsx` file. Can this method or a similar method be used for those types of files? – DecimalTurn Nov 11 '22 at 18:03

score -2 · Answer 2 · answered Aug 09 '20 at 04:15

-2

probably because you didn't save those .xlsx correctly, such as simply change the files suffix to xlsx.

Here's a solution that works for me:

re-save the fail-to-open files to .xlsx, even if those files already have .xlsx suffix.

if you don't know which of the files need to be resaved, start with the files with unusual large size.

answered Aug 09 '20 at 04:15

Yikun

29
6

Doesn't address the technical question. – AdamC Nov 16 '22 at 15:25

Why some .xlsx files are successfully open with openpyxl, but others cannot be open?

2 Answers2