2

I've got a code that read data from many excel files in one folder. Here is the line responsible for reading data:

d[file] = pd.read_excel(filenames[file], nrows=ilosc_wierszy, usecols=range(kol_odp+1)).fillna("")

Some files are 1 601 KB big when the others are just 21 KB. I don't know why is that since the original file should be the same. Nevertheless on every file there is the same data. So how does it happen that python reads bigger files like 30 seconds and smaller in less than a second? In the line above I specified that I need just nrows and usecols so I thought that python should read only that and go to the next file. Why does it happen that it take so long and is there a way to make this faster?

Jimmar
  • 4,194
  • 2
  • 28
  • 43
Michał
  • 33
  • 3
  • Larger files contain more bytes and thus take longer to be read and processed. – Jan Christoph Terasa Apr 26 '21 at 20:59
  • 1
    It has to read the entire Excel file into memory before it can do any operations. Excel files aren't stored in a way that lets you jump to a specific cell without all the other trappings. – Tim Roberts Apr 26 '21 at 20:59
  • maybe this helps you: https://stackoverflow.com/questions/44654906/parallel-excel-sheet-read-from-dask – n4321d Apr 26 '21 at 21:01

0 Answers0