As a learning project for Python, I am attempting to read all Excel files in a directory and extract the names of all the sheets.
I have been trying several available Python modules to do this (pandas
in this example), but am running into an issue with most of them depending on openpyxl
.
This is my current code:
import os
import pandas
directory_root = 'D:\\testFiles'
# Dict to hold all files, stats
all_files = {}
for _current_path, _dirs_in_path, _files_in_path in os.walk(directory_root):
# Add all files to this `all_files`
for _file in _files_in_path:
# Extract filesystem stats from the file
_stats = os.stat(os.path.join(_current_path, _file))
# Add the full file path and its stats to the `all_files` dict.
all_files[os.path.join(_current_path, _file)] = _stats
# Loop through all found files to extract the sheet names
for _file in all_files:
# Open the workbook
xls = pandas.ExcelFile(_file)
# Loop through all sheets in the workbook
for _sheet in xls.sheet_names():
print(_sheet)
This raises an error from openpyxl
when calling pandas.ExcelFile()
: ValueError: Max value is 14
.
From what I can find online, this is because the file contains a font family above 14. How do I read from an Excel (xlsx) file while disregarding any existing formatting?
The only potential solution I could find suggests modifying the original file and removing the formatting, but this is not an option as I do not want to modify the files in any way.
Is there another way to do this that doesn't have this formatting limitation?