1
  • File: it is a file with about 100 rows and 7 columns. XLS extension (MS Excel 97-2003).
  • Issue: unable to read in the file with R through read.xlsx or read.xlsx2.
  • Code used to try to read the file:

    library(xlsxjars)
    library(rJava)
    library(xlsx) 
    excel <- read.xlsx("File.xls",sheetIndex=1,startRow=1,stringsAsFactor=F)
    
  • Error prompted:

Error in .jcall ("RJavaTools","Ljava/lang/object;","invokeMethod",cl, : java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream.

PD: for the record, I did try to read it with read.csv and it does read it but since it's an Excel file and it's not separated by commas or dots, R reads it as if everything is in 1 column. Maybe anyone can suggest a way to read it through read.csv?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Victor
  • 1,163
  • 4
  • 25
  • 45
  • Adjust the `sep` argument in `read.csv` or use `read.table` – Rich Scriven Nov 22 '14 at 17:37
  • what sep would you use? in Excel it is separated by columns, not any symbol in particular... – Victor Nov 22 '14 at 17:38
  • 2
    Use a space `sep = " "` or possibly a tab `sep = "\t"` – Rich Scriven Nov 22 '14 at 17:39
  • sep="\t" solved it! thanks!! – Victor Nov 22 '14 at 17:40
  • 5
    It sounds like what you had was not, in fact, an XLS formatted file (neither OOXML nor BIFF(?)), but rather a tab delimited text file. There is a trick whereby a file in a format that excel can understand is given the `.xls` extension so that Excel is opened when you open it via a GUI. Try looking at the file in a text editor to see what it really is (or use the `file` command line utility if it exists in you operating system). – Brian Diggs Nov 22 '14 at 18:12
  • @BrianDiggs is spot-on: if `read.csv` worked at all, then it's **not** an Excel file. Your operating system may *associate* it with Excel, but that's a Charlie Foxtrot for another day. Just use `read.csv` or `read.table` after determining what the separator is. – Carl Witthoft Nov 22 '14 at 18:46
  • This seems on the boundary of "simple problem; not worthy of answer" versus not answered yet. I'm voting to close but actually think Richard or Brian should post the answer since I think the problem (which is really an OS question) happens all the time. – IRTFM Nov 22 '14 at 19:49
  • @BondedDust Thought I'd give Richard a fair shot at answering it since he figured out the solution. But since he hasn't yet, I did. – Brian Diggs Nov 22 '14 at 22:30

2 Answers2

5

There are two things you have stated that make me think that you are not dealing with the type of file you think you are dealing with:

  1. Your InputStream was neither an OLE2 stream, nor an OOXML stream"
    
  2. "I did try to read it with read.csv() and it does read it ..."

If it really was a file in the MS Excel 97-2003 native format(s), read.csv would not have worked at all on it. The error from read.xlsx supports this. This appears to be a text file, and your comment that using sep="\t" in read.table allows it to be read correctly confirms it. What you have is a tab separated text file that happens to have the suffix .xls.

It should be noted that the suffix of a file is not a guarantee that the file is of the purported format. A much better indicators is running the file shell command on it (if that command is available on your operating system). File extensions are sometimes used by GUIs to determine which application to use to open the file with. As Excel is capable of reading a tab delimited file (and because Excel prorbably is not set to be the default application for a .tsv file, which would be the more traditional file extension), by using a .xls extension, the file will (generally) be opened by Excel.

In summary, just because the filename ends in .xls does not mean it really is an Excel native format file.

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • This is a much better explanation than what I could give anyway. :) You taught me something, while all I did was solve the immediate problem. – Rich Scriven Nov 22 '14 at 23:01
1

You can use this function, to read XML spreadsheets saved in 2003 .XLS format

    readExcelXML <- 
    function(filename) {
    library(xml2)
    library(magrittr)
    doc <- read_xml(filename)
    ns <- xml_ns(doc)
    rows <- xml_find_all(doc, paste0(".//ss:Worksheet/ss:Table/ss:Row"), ns = ns)
    values <- lapply(rows, . %>% xml_find_all(".//ss:Cell/ss:Data", ns = ns) %>%                 
    xml_text %>% unlist)

    columnNames <- values[[1]]

    dat <- do.call(rbind.data.frame, c(values[-1], stringsAsFactors = FALSE))
    names(dat) <- columnNames

    dat
    }
Heymloul
  • 61
  • 1
  • 4