I'm using read_excel
for speed and simplicity to import an Excel file.
Unfortunately, there's as yet no capability of excluding selected columns which won't be needed from the data set; to save effort, I am naming such columns "x"
with the col_names
argument, which is easier than trying to keep track of x1
, x2
, and so on.
I'd then like to exclude such columns on the fly if possible to avoid an extra step of copying, so in pseudocode:
read_excel("data.xlsx", col_names = c("x", "keep", "x"))[ , !"x"]
We can use the sample data set included with the readxl
package for illustration:
library(readxl)
DF <- read_excel(system.file("extdata/datasets.xlsx", package = "readxl"),
col_names = c("x", "x", "length", "width", "x"), skip = 1L)
The approaches I've seen that work don't exactly work on the fly, e.g., having stored DF
, we can now do:
DF <- DF[ , -grep("^x$", names(DF))]
This works but requires making a copy of DF
by storing it, then overwriting it; I'd prefer to remove the columns in the same command as read_excel
to allocate DF
properly ab initio.
Other similar approaches require declaring temporary variables, which I prefer to avoid if possible, e.g.,
col_names <- c("x", "x", "length", "width", "x")
DF <- read_excel(system.file("extdata/datasets.xlsx", package = "readxl"),
col_names = col_names, skip = 1L)[ , -grep("^x$", col_names)]
Is there a way to axe these columns without creating unnecessary temporary variables?
(I could convert to data.table
, but am wondering if there's a way to do so without data.table
)