I am regularly receiving data from a source that is producing a non-standard Excel format which can't be read by readxl::read_excel
. Here is the github issue thread. Consequently I have a whole directory tree containing hundreds of (almost) Excel files that I would like to read into R and combine with plyr::ldply
The files can, however, be opened just fine by XLConnect::loadWorkbook
. But unfortunately, even with allocating huge amounts of memory for the Java virtual machine, it always crashes after reading a few files. I tried adding these three lines to my import function:
options(java.parameters = "-Xmx16g")
detach("package:XLConnect", unload = TRUE)
library(XLConnect)
xlcFreeMemory()
However, I still get:
Error: OutOfMemoryError (Java): Java heap space
All I need to do is resave them in Excel and then they read in just fine from readxl::read_excel
. I'm hoping I could also resave them in batch using XLConnect
and then read them in using readxl::read_excel
. Unfortunately, using Linux, I can't just script Excel to resave them. Does anyone have another workaround?