R read excel by column names

Question

So I have a bunch of excel files I want to loop through and read specific, discontinuous columns into a data frame. Using the readxl works for the basic stuff like this:

library(readxl)
library(plyr)
wb <- list.files(pattern = "*.xls")
dflist <- list()

for (i in wb){
  dflist[[i]] <- data.frame(read_excel(i, sheet = "SheetName", skip=3, col_names = TRUE))
}

# now put them into a data frame
data <- ldply(dflist, data.frame, .id = NULL)

This works (barely) but the problem is my excel files have about 114 columns and I only want specific ones. Also I do not want to allow R to guess the col_types because it messes some of them up (eg for a string column, if the first value starts with a number, it tries to interpret the whole column as numeric, and crashes). So my question is: How do I specify specific, discontinuous columns to read? The range argument uses the cell_ranger package which does not allow for reading discontinuous columns. So any alternative?

Is it possible for you to save the excel files as csv files and import from them? — DarrenRhodes, Aug 15 '17 at 15:08
I have over 100 of them, each with multiple sheets so...no :) — Alex, Aug 15 '17 at 15:15

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

.xlsx >>> you can use library `openxlsx`

The read.xlsx function from library openxlsx has an optional parameter cols that takes a numeric index, specifying which columns to read.

It seems it reads all columns as characters if at least one column contains characters.

openxlsx::read.xlsx("test.xlsx", cols = c(2,3,6))

.xls >>> you can use library `XLConnect`

The potential problem is that library XLConnect requires library rJava, which might be tricky to install on some systems. If you can get it running, the keep and drop parameters of readWorksheet() accept both column names and indices. Parameter colTypes deals with column types. This way it works for me:

options(java.home = "C:\\Program Files\\Java\\jdk1.8.0_74\\") #path to jdk
library(rJava)
library(XLConnect)
workbook <- loadWorkbook("test.xls")
readWorksheet(workbook, sheet = "Sheet0", keep = c(1,2,5))

Edit:

Library readxl works well for both .xls and .xlsx if you want to read a range (rectangle) from your excel file. E.g.

readxl::read_xls("test.xls", range = "B3:D8")
readxl::read_xls("test.xls", sheet = "Sheet1", range = cell_cols("B:E"))
readxl::read_xlsx("test.xlsx", sheet = 2, range = cell_cols(2:5))

Thanks. Yes I thought of using that but my files are in `xls` format, not `xlsx` so the package cannot handle that. — Alex, Aug 15 '17 at 15:44
@Alex I updated my answer and added a solution that should work for .xls files. — Ape, Aug 16 '17 at 13:20

R read excel by column names

1 Answers1

.xlsx >>> you can use library `openxlsx`

.xls >>> you can use library `XLConnect`

Linked

Related

R read excel by column names

1 Answers1

.xlsx >>> you can use library openxlsx

.xls >>> you can use library XLConnect

Linked

Related

.xlsx >>> you can use library `openxlsx`

.xls >>> you can use library `XLConnect`