5

I'm getting really slow read times using read_excel on a particular machine with a particular xlsm file. Reading the same xlsm file across 2 different machines I get read times of 72 seconds versus 0.2 seconds. Where the slower machine actually has more computational power.

Both machines are Windows and I'm using readxl package version 1.1.0 on both machines. I've tried using package v1.2.0 on the slower machine and the same problem persists.

I had been reading this xlsm file into R for the last 2 years+ without any problems. I had been regularly running a set of macros on the excel file over those 2 years. Now, however, when I run those same macros and save the file, the slower machine now take 70s instead of <1s to read it into R. If I ask the slower machine to read older versions of the xlsm file it has no problems. The file is approximately 15MB. It doesn't matter which tab of the xlsm file I ask read_excel to read, the time is still about 70 seconds.

I've updated R (v3.5.2) and ensured all the readxl package dependencies are up to date too. I'm at a loss as to what else to try, any ideas would be much appreciated.

Will T-E
  • 607
  • 1
  • 7
  • 16
  • A shot in the dark: maybe one of the alternatives `openxlsx::read_xlsx` or `openxlsx::loadWorkbook` offer better performance (I seem to recall reading `.xlsm;.xlsx;.xlsb` files with `openxlsx`, only `.xls` was not supported as far as I know) – niko Dec 29 '18 at 09:27
  • what does `readxl::excel_format()` say? – Wimpel Dec 29 '18 at 09:36
  • @Wimpel on both machines 'xlsx'. – Will T-E Dec 29 '18 at 09:40
  • thanks @nate.edwinton. Using `openxlsx::read.xlsx` I get 9 seconds read time on slower machine and 1.7 seconds on faster machine. – Will T-E Dec 29 '18 at 09:41
  • Does opening the file in Excel (or LibreOffice's Calc) demonstrate the same difference in opening times? – r2evans Dec 29 '18 at 09:44
  • @r2evans, on the slower machine it takes 46 seconds to open, on the faster machine 21 seconds. – Will T-E Dec 29 '18 at 09:48
  • That is perplexing, I was thinking it might be related to something related to Windows. (This might still be the case, but is less obvious now to me.) – r2evans Dec 29 '18 at 09:57
  • @r2evans I could try a complete reinstall of windows, I have already tried a re-install of MSOffice. – Will T-E Dec 29 '18 at 10:05
  • I have no reason (other than curiosity) to think that that would help in this situation. Then again, I can't think of anything else. Sorry, good luck. – r2evans Dec 29 '18 at 10:09

1 Answers1

1

Edit: When I tested my below solution it worked once and then never again after that.

New solution: The only real solution I can find is reverting to an older version of readxl.

1) Install RTools 3.5 - https://cran.r-project.org/bin/windows/Rtools/

2) Install devtools package - install.packages("devtools")

3) Install readxl version 1.2.0 - devtools::install_version("readxl", "1.2.0")

Incorrect solution:

A similar thing has been happening to me and I recently managed to "fix" it by deleting all of the workbook/worksheet 'names' in the excel file before trying to load it in with readxl.

Some background: People had added a bunch of junk names referencing external connections and I'm wondering if readxl was trying to resolve these before loading in the data causing it to hang. I don't know why it only hangs on my current computer while the same file won't hang when loading on another computer.

  • Thanks for you input Anthony, what do you mean by 'junk names'? – Will T-E Aug 23 '19 at 16:45
  • 1
    In excel if you go to Formulas -> Name Manager you will see a list of all the names that exist in your workbook. There were many that were #REF errors or just no longer being used for anything. – Anthony Bucci Aug 27 '19 at 15:10