2

I am using Python 2.7 & openpyxl==2.5.11.

I want to read .xlsx files using openpyxl. Initially I was testing with files downloaded from Google Drive and everything worked nice.

Now, I tried to load some files generated with Microsoft Excel, but I see this error: raise IOError("File contains no valid workbook part")

I tried to print some variables and figure out on my own, but I lack deeper knowledge about Excel files and there are some levels of abstraction I couldn't quickly understand.

Here is the relevant piece of code where the error is raised (excel.py):

def _find_workbook_part(package):
    workbook_types = [XLTM, XLTX, XLSM, XLSX]
    for ct in workbook_types:
        part = package.find(ct)
        if part:
            return part

    # some applications reassign the default for application/xml
    defaults = set((p.ContentType for p in package.Default))
    workbook_type = defaults & set(workbook_types)
    if workbook_type:
        return Override("/" + ARC_WORKBOOK, workbook_type.pop())

    raise IOError("File contains no valid workbook part")

I have the problem on both OSX & Ubuntu, if that's relevant.

EDIT: I am not capable to reproduce the issue with files I have generated on my own. I think the problem can only be reproduced with older files. The people who had the issue were using Excel 2008 or older version to create the files, so maybe that's the problem?

Thanks in advance

giliev
  • 2,938
  • 4
  • 27
  • 47
  • Is the extension of the file that causes it to fail .xls or .xlsx? I would try opening and re-saving the problematic file in Excel and then loading it again, to see if it might be a strange/subtle incompatibility issue. – cody Nov 22 '18 at 01:28
  • It fails with both extensions. – giliev Nov 22 '18 at 01:31
  • The exception is pretty clear: the files are probably not valid XLSX files. I suggest you submit a bug report with a sample file. – Charlie Clark Nov 22 '18 at 10:18
  • 1
    @stovfl thanks for the feedback, I almost forgot we have a layer of abstraction which allows us to open `xls` files. So, it comes down to opening `xlsx` files :) – giliev Nov 22 '18 at 11:25
  • 1
    So, [edit] your Question and remove any `xls` reference to get your title right. – stovfl Nov 22 '18 at 11:32
  • Thanks, I edited the question. – giliev Nov 22 '18 at 11:51
  • I updated the question, please check the EDIT. Thanks :) – giliev Nov 25 '18 at 18:59

2 Answers2

0

this worked for me

df = pd.read_excel("Set.xlsb", sheet_name='Dataset', engine='pyxlsb')

Credits to the answer here: Using Pandas with XLSB File

Reema
  • 1
  • 1
  • The question is about opening an `.xlsx` file. Can this method or a similar method be used for those types of files? – DecimalTurn Nov 11 '22 at 18:03
-2

probably because you didn't save those .xlsx correctly, such as simply change the files suffix to xlsx.

Here's a solution that works for me:

re-save the fail-to-open files to .xlsx, even if those files already have .xlsx suffix.

if you don't know which of the files need to be resaved, start with the files with unusual large size.

Yikun
  • 29
  • 6