I am having troubles opening excel files (xlsx) in pandas. I had a small application that used to work, but then another application with the same code was failing miserably when opening the same files.
The files have not changed since and I cannot change them (they are provided externally)
After some digging I was able to find out that this seems to be linked to the version of openpyxl:
- If the version is set to
openpyxl==3.0.9
: files are opened without failing. - If the version is
openpyxl==3.1.1
: 2 out of three files fail to open raising exception'ReadOnlyWorksheet' object has no attribute 'defined_names'
File "path\venvtest\lib\site-packages\openpyxl\reader\workbook.py", line 109, in assign_names sheet.defined_names[name] = defn AttributeError: 'ReadOnlyWorksheet' object has no attribute 'defined_names'
import pandas as pd
folder = 'C:\\Path\\of\\folder\\'
FILES = [
folder + 'file1.xlsx',
folder + 'file2.xlsx',
folder + 'file3.xlsx'
]
for file in FILES :
#make sure the file is readable by pandas
try :
sheet_name = 'Inventory Summary'
df = pd.read_excel(file, sheet_name=sheet_name, engine='openpyxl')
print(f'file : OK')
except Exception as e:
print(f'file : ERROR : {file}')
print(e)
continue
I can obviously keep the old version for now, but I am curious to know if anyone has a clearer understanding of what is going on.