2

I am trying to create a function that reads in all sheets in an excel workbook using readxl::read_excel and binds them into a single data frame, and allows me to pass through additional arguments to read_excel. I can do the first part fine, but not the second part.

library(magrittr)

# example excel workbook with multiple sheets
path <- readxl::readxl_example("datasets.xlsx")

# function with simple forwarding
read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(~ readxl::read_excel(path = path, sheet = .x, ...))

}

# errors with and without additional arguments
read_all(path)
read_all(path, skip = 5)

I should get back a single file, instead I get an error:

Error: Can't guess format of this cell reference: iris
In addition: Warning message: Cell reference follows neither the A1 nor R1C1 format. Example: iris NAs generated.

Without argument passing the function works fine:

# Function works without passing extra params
read_all_0 <- function(path) {

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(~ readxl::read_excel(path = path, sheet = .x))

}

read_all_0(path)

Argument passing works fine in a simple function without purrr::map_df

read_test <- function(path, ...) {

  path %>% readxl::read_excel(...)
}
read_test(path, skip = 10)
camille
  • 16,432
  • 18
  • 38
  • 60
Lief Esbenshade
  • 793
  • 4
  • 13
  • Can you try: (1) using a normal anonymous function in your map call `function(x) {}` instead of the lamda function notation `~`. If the error still shows (2) omit the pipe after `set_names` and use an intermediate variable instead. – TimTeaFan Oct 04 '19 at 22:36
  • no dice. Still getting the same error message. `doc <- path %>% readxl::excel_sheets() %>% rlang::set_names()` and `purrr::map_df(doc, function(x) {readxl::read_excel(path = path, sheet = .x, !!!args)})` }``` – Lief Esbenshade Oct 04 '19 at 23:26
  • You misspecified the anonymous function. If you use `function(x)` you have to replace `.x` with `x`. See my response below. – TimTeaFan Oct 05 '19 at 08:09

2 Answers2

2

A possible solution is to create a named function which only takes one argument and pass it to map so that the only argument is the vector/list you are looping over.

Applied to your problem a solution would look like this:

# function with forwarding
read_all <- function(path, ...) {

  # function within function that sets the arguments path and ellipsis as given and only leaves sheet to be determined
  read_xl <- function(sheet) {
    readxl::read_excel(path = path, sheet, ...)
  }

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(read_xl)

}

# this allows you to pass along arguments in the ellipsis correctly
read_all(path)
read_all(path, col_names = FALSE)

It seems this problem is stemming from an improper environment handling of the purrr::as_mapper function. To circumvent this, I suggested using an anonymous function in the comments. Apparently, the approach below works as well.

read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(function(x) {
                      readxl::read_excel(path = path, sheet = x, ...)
                   })

}

To verify that it is really the as_mapper function that is causing the problem, we can rewrite the named function-in-function from above using as_mapper. This again yields errors with and without additional arguments in the ellipsis.

# function with forwarding
read_all <- function(path, ...) {

  # named mapper function
  read_xl <- purrr::as_mapper(~ readxl::read_excel(path = path, sheet = .x, ...))

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(read_xl)

} 

Update Knowing that as_mapper is causing the issue allows us to dig deeper into the problem. Now we can inspect in the RStudio debugger what is happening under the hood when running a simple mapper version of read_excel:

read_xl <- purrr::as_mapper(~ readxl::read_excel(path = .x, sheet = .y, ...))
debugonce(read_xl) 
read_xl(path, 1)

It seems that when the ellipsis is included in the mapper function, as_mapper maps the first argument not only to .x but also automatically to the ellipsis .... We can verify this by creating a simple mapper function paster taking two arguments .x and ....

paster <- purrr::as_mapper(~ paste0(.x, ...))
paster(1)
> [1] "11"
paster(2)
> [1] "22"

The question now is: is there another way we are supposed to use ellipsis in mapper functions or is this a bug.

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
0

I would have thought the following would work:

read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    purrr::set_names() %>%
    map_df(~readxl::read_excel(path=path, sheet=.x), ...)

}

because the map family has a ... argument for passing additional arguments to the mapped function. However, the following code ignores the n_max argument and still returns all rows of the various data frames instead of a data frame with 8 rows (2 rows from each of the four sheets):

p <- readxl_example("datasets.xlsx")
read_all(p, n_max=2)

However, this works:

read_all <- function(path, ...) {

  path %>% 
    excel_sheets() %>% 
    set_names() %>%
    map_df(read_excel, path=path, ...)

}

p <- readxl_example("datasets.xlsx")
read_all(path=p, n_max=2)

In the above, path and any additional arguments in ... are passed to read_excel and (apparently) the sheet name (which would be the .x if we used it explicitly) is implicitly passed to the sheet argument, I guess because the path argument, which is first, has already been provided. I don't really get this and it doesn't seem like a particularly transparent approach, but I thought I'd put it out there in case someone else can explain what's going on and provide better code.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Thanks. I also want to understand exactly whats going on. From the documentation for `map_df` under `Arguments: .f` `If a function, it is used as is. If a formula, e.g. ~ .x + 2, it is converted to a function...` I was using the formula notation, and my guess is that something weird was happening with the conversion to a function. – Lief Esbenshade Oct 05 '19 at 00:10
  • 2
    I'd suggest "un-accepting" this answer. We want people to look at your question and my answer and see if they can explain what's going on and perhaps propose better code. Fewer people will come by if the answer is already accepted. – eipi10 Oct 05 '19 at 01:26
  • 1
    `~` is a quoting function, and its output is a formula, the dots are no more passed to it that they'd be to the string `"hello ... world"` or to a function argument (with dot formals or not) for that matter. In the accepted solution, the dot are passed to `map_df`, which then deals with them internally, by converting its formula argument to a function if relevant, and feeding to it sequentially the elements of its main argument along with the dots. – moodymudskipper Oct 05 '19 at 13:19