7

To avoid "duplicate" close request: I know how to read Excel named ranges; examples are given in the code below. This is about "real" tables in Excel.

Excel2007 and later have the useful concept of tables: you can convert ranges to tables, and avoid hassles when sorting and rearranging. When you create a table in an Excel range, it gets a default name (Tabelle1 in German Version, TableName in the following example), but you can additionally simply name the range of the table (TableAsRangeName); as indicated by the icons in the Excel range name editor, these two seem to be treated differently.

I have not been able to read these tables (in the strict sense) from R. The only known workaround is using CSV intermediate, or converting the table to a normal named range, which has nasty irreversible side effects when you use column names in cell references; these are converted to A1 notation.

The example below shows the problem. You mileage may vary with different combinations of 32/64 bit ODBC drivers and 32/64 bit Java

# Read Excel Tables (not simply named ranges)
# Test Computer: 64 Bit Windows 7, R 32 bit  
# My ODBC drivers are 32 bit
library(RODBC)
# Test file has three ranges
# NonTable Simple named range
# TableName Name of table 
# TableAsRangeName Named Range covering the above table
sampleFile = "ExcelTables.xlsx"
if (!file.exists(sampleFile)){
  download.file("http://www.menne-biomed.de/uni/ExcelTables.xlsx",sampleFile)
  # Or do it manually, if this fails
}
# ODBC
channel = odbcConnectExcel2007(sampleFile)
sqlQuery(channel, "SELECT * from NonTable") # Ok
sqlQuery(channel, "SELECT * from TableName") # Could not find range
sqlQuery(channel, "SELECT * from TableAsRangeName") # Could not find range
close(channel)

# gdata has read.xls, but seems not to support named regions

library(xlsx)
wb = loadWorkbook(sampleFile)
getRanges(wb) # This one fails already with "TableName" does not exist
ws = getSheets(wb)[[1]]
readRange("NonTable",ws) # Invalid range address
readRange("TableName",ws) # Invalid range address
readRange("TableAsRangeName",ws) # Invalid range address

# my machine requires 64 bit for this one; depends on your Java installation
sampleFile = "ExcelTables.xlsx"
library(XLConnect) # requires Java
readNamedRegionFromFile(sampleFile,"NonTable") # OK
readNamedRegionFromFile(sampleFile,"TableName") # "TableName" does not exist
readNamedRegionFromFile(sampleFile,"TableAsRangeName") # NullPointerException

wb <- loadWorkbook(sampleFile)
readNamedRegion(wb,"NonTable") # Ok
readNamedRegion(wb,"TableName") # does not exist
readNamedRegion(wb,"TableAsRangeName") # Null Pointer
Dieter Menne
  • 10,076
  • 44
  • 67
  • My bet is that the simplest way is to go back to the Excel Workbook and create a new sheet which contains a simple set of links (or formulas) to the named table's cells. Then tell `R` to read from that worksheet. – Carl Witthoft Jul 16 '13 at 11:56
  • XLConnect does not yet support Excel tables. Also, named ranges based on table formulas are not yet supported. However, I'll be looking into this to see what can be done on that front. – Martin Studer Jul 16 '13 at 20:29
  • Thanks, Martin Studer (author of XLConnect). I yesterday did some digging in XML and found that tables are mapped to Ranges in an extra directory. Please post here in case you get it to work. – Dieter Menne Jul 17 '13 at 05:50
  • The excel file on the link given is corrupt. Please check if it is ok on your side. – user1609452 Jul 17 '13 at 07:47
  • Sorry and thanks. Corrected, please try again. – Dieter Menne Jul 17 '13 at 08:11
  • @CarlWitthoft: Since no solution is in sight, I tried your's (creating a link). However, this Insane Excel converts links to empty fields to zero. – Dieter Menne Jul 24 '13 at 17:08
  • 1
    Just in case someone else tries this: you must use an IF(ISBLANK.. construct to avoid the empty cells. – Dieter Menne Jul 24 '13 at 17:15
  • @DieterMenne yeah, Excel's inability to treat cells with nothing the same as cells you never touched is freaking insane. One more reason to go evangelistic and get all your friends and cow-orkers to stop using it. – Carl Witthoft Jul 25 '13 at 12:30

3 Answers3

4

I've added some initial support for Excel tables in XLConnect. Please find the latest changes on github at https://github.com/miraisolutions/xlconnect

In the following a small sample:

require(XLConnect)
sampleFile = "ExcelTables.xlsx"
wb = loadWorkbook(sampleFile)
readTable(wb, sheet = "ExcelTable", table = "TableName")

Note that Excel tables are associated to a sheet. So as far as I can see it's possible to have multiple tables with the same name associated to different sheets. For this reason there is a sheet-argument to readTable.

Martin Studer
  • 2,213
  • 1
  • 18
  • 23
  • Looks like there is one of the nasty 64/32 bit dependency problems with RJava. I have the toolchain installed and use it regularly, but could not get xlconnect build; tried both 32 bit and 64 bit R. error: unable to load shared object 'D:/R/R/library/rJava/libs/i386/rJava.dll': LoadLibrary failure: %1 ist keine zulässige Win32-Anwendung. Will check tomorrow. – Dieter Menne Jul 17 '13 at 20:29
  • See https://github.com/miraisolutions/xlconnect/issues/21 and https://github.com/miraisolutions/xlconnect/issues/20 – Dieter Menne Jul 18 '13 at 09:26
  • This is great, how difficult is it to be able to flush & write back (/append) to an Excel table? Can't seem to get it to work, tried the writeNamedRegion(). – MattV Jul 15 '18 at 20:24
3

You are correct that the table definitions are stored in XML.

sampleFile = "ExcelTables.xlsx"
unzip(sampleFile, exdir = 'test')
library(XML)
tData <- xmlParse('test/xl/tables/table1.xml')
tables <- xpathApply(tData, "//*[local-name() = 'table']", xmlAttrs)
[[1]]
            id           name    displayName            ref totalsRowShown 
           "1"    "TableName"    "TableName"        "G1:I4"            "0" 
library(XLConnect)

readWorksheetFromFile(sampleFile, sheet = "ExcelTable", region = tables[[1]]['ref'], header = TRUE)
    Name Age AgeGroup
1  Anton  44        4
2 Bertha  33        3
3  Cäsar  21        2

Depending on your situation you could search in the XML files for appropriate quantities.

user1609452
  • 4,406
  • 1
  • 15
  • 20
  • I had just implemented something similar to your solution, with some additional stuff from http://housesofstones.com/blog/2013/06/20/quickly-read-excel-xlsx-worksheets-into-r-on-any-platform/#.UeaP6417IzY because I feared readWorksheetFromFile would stumble over my too big real file. – Dieter Menne Jul 17 '13 at 12:38
  • An interesting blog. Used to be that you could remove the password protection on excel files by making some amendments to the underlying xml files. – user1609452 Jul 17 '13 at 12:49
  • Sorry for transferring the credit to Martin. I gave you a +1 instead. – Dieter Menne Jul 17 '13 at 20:11
  • @user1609452, your answer saved me a lot of time. what if your Excel workbook have multiple sheets? Is there a way to identify in what sheet your table is stored? – rjss Nov 18 '20 at 17:37
0

Later addition:

readxl::readxl can read "real" tables and probably is the least troublesome solution when you want to read data frames/tibbles.

** After @Jamzy comment ** I tried again and could not read named ranges. False positive then or false negative now???

Dieter Menne
  • 10,076
  • 44
  • 67
  • 1
    could you please elaborate? I can't find any mention of this in the documentation. – Jamzy Jul 07 '17 at 01:51
  • No, I did not find in docs, just tried. On the other side, the fact that most other readers did not support "real" tables was not documented either. Have you tried? Any other experience? – Dieter Menne Jul 07 '17 at 18:17