pandas read_excel does not work for some reason

Question

So I'm just trying to read a pretty normal .xlsx file with pandas but for some reason it doesn't work. First of all I do not get any errors, just this warning:

UserWarning: Workbook contains no default style, apply openpyxl's default warn("Workbook contains no default style, apply openpyxl's default")

I've tried specifying the engine to openpyxl and the sheet name but neither one fixed the problem. I really don't know what's the problem since the code is just one line:

df = pd.read_excel('data/RESULT_LIST_ANONYM_ENERGY_MARKET_aFRR_2021-12-01_2021-12-31.xlsx')

Here are the versions of my packages that might be interesting:

pandas: 1.5.0
openpyxl: 3.0.10

What does `it doesn't work` mean? Is `df` empty? Does that file contain anything? Does the file contain multiple sheets perhaps? — Panagiotis Kanavos, Oct 07 '22 at 11:24
Please look at this [topic](https://stackoverflow.com/questions/66214951/how-to-deal-with-warning-workbook-contains-no-default-style-apply-openpyxls) may be it will help you. — Ange Guillaume Koffi, Oct 07 '22 at 11:29
Please look at this topic. It may help you. https://stackoverflow.com/questions/66214951/how-to-deal-with-warning-workbook-contains-no-default-style-apply-openpyxls — Ange Guillaume Koffi, Oct 07 '22 at 11:33
@PanagiotisKanavos it basically just gets stuck, doesn't throw an error but also doesn't go past that line of code. The only thing I think I've figured out is that it keeps coming back to lines 89-92 of openpyxl->worksheet->_read_only.py. I think it for some reason keeps incrementing both counter and idx but I don't know why. — maxxel_, Oct 07 '22 at 11:40
`doesn't go past that line of code.` so the code is either slow or freezes. How many rows are there? Does the file contain a lot of empty rows perhaps? Python can be 10 times slower than other languages when loading Excel files. I've encountered script-generated files that may have only 200 rows of data and 1M of empty rows. Pandas can take 10 times than a C# library to load such files. If you open the file in Excel and save it, Excel could remove all the empty rows. — Panagiotis Kanavos, Oct 07 '22 at 11:52
@PanagiotisKanavos yea it wasn't empty rows, the table itself just had 160k rows with 6 columns so I just overestimated the speed of reading it in. thank you though! :) — maxxel_, Oct 07 '22 at 11:56
If the file is generated by a script you may be able to reduce its size and improve performance by loading it in Excel and saving again. Most scripts and libraries store eg string values directly in the row. Excel will store them in a lookup "table" and only include the reference in each cell. This can reduce space dramatically, and speed up loading. — Panagiotis Kanavos, Oct 07 '22 at 12:02
@PanagiotisKanavos I just went ahead and save the file as a .csv and I'm amazed at how much faster it can read the table. Now it works like a charm. — maxxel_, Oct 07 '22 at 12:09

score 0 · Answer 1 · answered Oct 07 '22 at 11:25

0

Try this:

df = pd.read_excel('data/RESULT_LIST_ANONYM_ENERGY_MARKET_aFRR_2021-12-01_2021-12-31.xlsx',engine='openpyxl')

answered Oct 07 '22 at 11:25

brenodacosta

389
2
13

i did try this already – maxxel_ Oct 07 '22 at 11:33

score 0 · Accepted Answer · answered Oct 07 '22 at 11:53

0

ok so I put a print into line 90 of openpyxl->worksheet->_read_only.py to see the progress and it seems like the file was just way bigger than I thought and it just takes like half a minute to read the whole table.

answered Oct 07 '22 at 11:53

maxxel_

437
3
13

score 0 · Answer 3 · answered Oct 07 '22 at 13:29

Please try to use the warning package to avoid it:

import warnings
with warnings.catch_warnings(record=True):
    warnings.simplefilter("always")
    df = pd.read_excel('data/RESULT_LIST_ANONYM_ENERGY_MARKET_aFRR_2021-12-01_2021-12-31.xlsx', engine="openpyxl")

pandas read_excel does not work for some reason

3 Answers3