3

I have a collection of excel files which I am importing into R.

The files contain hidden data which I would like to disregard -- eg, simply not importing it, or importing it with a flag indicating it was hidden so that I can then drop it.

The files contain two types of hidden data:

  • Complete sheets are hidden
  • Specific Rows within a sheet are hidden.

Is there a way to identify when data in excel is hidden?

Right now I am using the gdata package, but am happy to use XLConnect or other package

Sample Code:

library(gdata)
xlsfile <- "test.xls"

# grab all the sheet names. 
# This is giving me both hidden & non-hidden sheets.  I would like only the latter
sheets <- sheetNames(xlsfile)

# read in the xls file, by sheet 
xlData <- 
  lapply(sheets, function(s) 
      read.xls(xlsfile, sheet=s, stringsAsFactors = FALSE))

if needed, I can create a dummy xls file and post it.

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • 2
    My commiseration for being stuck in this circle of hell. I think it might be easier to do this from the other end, i.e., write a VBA script that exports non-hidden data as CSV. But I cannot offer more specific advice, as I would reject these files on principle. – Roland Oct 02 '13 at 15:26
  • You probably already know this but you can check if visible before copying data: if Sheets(ws).Visible = false then copy... – Automate This Oct 02 '13 at 15:35
  • @PortlandRunner Is the `visible` flag something that is (or could be) exported? – Ricardo Saporta Oct 02 '13 at 15:36
  • Thanks @Roland. If only rejecting the files were an option, but the data inside is critical. Going the VBA route might end up being the most direct – Ricardo Saporta Oct 02 '13 at 15:37

1 Answers1

2

XLConnect has a nice function called isSheetHidden which does what you want. Assuming Sheet2 is hidden:

library(XLConnect)
xlsfile <- "Book1.xls"
wb <- loadWorkbook(xlsfile, create = TRUE)
isSheetHidden(wb, "Sheet1") # FALSE
isSheetHidden(wb, "Sheet2") # TRUE

In gdata you would have to write your own function that calls the underlying perl package to access the sheet property, but it is possible.

nograpes
  • 18,623
  • 1
  • 44
  • 67
  • Thats great, thanks! now if only they had something similar for rows...? – Ricardo Saporta Oct 02 '13 at 16:54
  • Yeah, `isHidden` doesn't work for rows. So, `XLConnect` wraps HFFS (Horrible Spreadsheet Format), which is a nice Java library. The `XLConnect` wrapper is [posted to Github](https://github.com/miraisolutions/xlconnect-java). The [API for HFFS seems to indicate that they have a `getHidden` for rows](http://poi.apache.org/apidocs/org/apache/poi/hssf/record/ColumnInfoRecord.html#getHidden%28%29). It may not be that hard to slightly edit the wrapper so you expose the `getHidden` function. You might even ask the author to do it for you :) – nograpes Oct 02 '13 at 18:52
  • 1
    You can do the row import with resepct to being hidden using `rcom` http://rcom.univie.ac.at/ – Hansi Oct 03 '13 at 09:01
  • 1
    @Hansi Can you provide an example (perhaps as an answer?) – nograpes Oct 03 '13 at 17:45