1

I'm looking for a way to automate some data clean-up. So far, I've figured out how to import several excel workbooks and all their sheets into R using the following:

my_data = expand_grid(
    file = list.files("~path", full.names = TRUE),
     sheet = seq(6)
 ) %>%
     transmute(data = file %>% map2(sheet, ~ read_excel(path = .x, sheet = .y, skip = 2))) %>%
     pull(data)

What I need help with is exporting all of the resulting tibbles into an Excel workbook, one tibble per sheet.

I'm using the following method to try to print (based on a previously answered question):

library(openxlsx)
library(tibble)
output_expediter <- function(df, output_filename) {
     nm1 <- deparse(substitute(df))
     wb <- createWorkbook()
     addWorksheet(wb, sheetName = nm1)
     writeData(wb, sheet= nm1, x = df)
     saveWorkbook(wb =wb, file = output_filename, overwrite = TRUE)
     }

I get the following error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 4, 52, 8, 0, 5

It is true that all the tibbles differ in row numbers.I'm unsure of how to proceed.

Any help would be greatly appreciated.

1 Answers1

1

Here is a way to do it:

library(purrr)

my_data <- list.files(pattern = ".xlsx") |>
  set_names() |>
  map(rio::import_list)
  
iwalk(my_data, ~ writexl::write_xlsx(.x, .y))

How it works

  1. We use set_names which gives a named vector, where the names and values (for the moment) are the file path to the workbooks.
  wb1.xlsx   wb2.xlsx 
"wb1.xlsx" "wb2.xlsx"
  1. We iterate over this named list using map and to each element we apply rio::import_list. This will import all sheets of a workbook, preserving the sheet names. By doing this, we have preserved both the workbook path AND the worksheet names:
map(my_data, \(x) map(x, \(y) head(y, 1)))

$wb1.xlsx
$wb1.xlsx$a
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa

$wb1.xlsx$b
  mpg cyl disp  hp drat   wt  qsec vs am gear carb
1  21   6  160 110  3.9 2.62 16.46  0  1    4    4

$wb1.xlsx$c
            name height mass hair_color skin_color eye_color birth_year  sex
1 Luke Skywalker    172   77      blond       fair      blue         19 male
     gender homeworld species films vehicles starships
1 masculine  Tatooine   Human    NA       NA        NA


$wb2.xlsx
$wb2.xlsx$c
            name height mass hair_color skin_color eye_color birth_year  sex
1 Luke Skywalker    172   77      blond       fair      blue         19 male
     gender homeworld species films vehicles starships
1 masculine  Tatooine   Human    NA       NA        NA

$wb2.xlsx$b
  mpg cyl disp  hp drat   wt  qsec vs am gear carb
1  21   6  160 110  3.9 2.62 16.46  0  1    4    4

$wb2.xlsx$a
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
  1. Lastly we iterate over each workbook name and write all the data frames to sheets within a workbook. By default writexl::write_xlsx will use the list names as sheet names.

Data

I created some fake workbooks in my current working directory:

l1 <- list(iris, mtcars, dplyr::starwars)
names(l1) <- letters[seq(length(l1))]
l2 <- rev(l1)

writexl::write_xlsx(l1, "wb1.xlsx")
writexl::write_xlsx(l2, "wb2.xlsx")
LMc
  • 12,577
  • 3
  • 31
  • 43
  • Thank you for the response! I'm still running into a problem. my_data <- list.files(pattern = ".xlsx") |> set_names() Works just fine. map(rio::import_list) Is giving me the following error: Error in `map()`: ℹ In index: 1. With name: ~filename.xlsx Caused by error: ! `path` does not exist: ~filename.xlsx – Maria Hernandez Aug 02 '23 at 20:37
  • The code I have posted likely needs to be adjusted for your file system. As I mentioned, I created these workbooks in my *current working directory*. You should specify the `path` argument of `list.files` to point to the location where your workbooks exist. – LMc Aug 02 '23 at 20:52
  • I did specify path within it like so: my_data <- list.files(path = "~path", pattern = ".xlsx")|> set_names() |> map(rio::import_list) Is there somewhere else I should be doing it? Sorry for all the questions, and thank you so much for your help! – Maria Hernandez Aug 03 '23 at 13:11
  • You are welcome! Can shoe me the output of `list.files(path = "~path", pattern = ".xlsx")|> set_names()`? (Without the `map(rio::import_list)`). – LMc Aug 03 '23 at 14:23
  • ```> my_data Application Support Analyst_Canada_Job Post.xlsx Application Support Analyst_Montreal_Job Post.xlsx "Application Support Analyst_Canada_Job Post.xlsx" "Application Support Analyst_Montreal_Job Post.xlsx" Application Support Analyst_Toronto_Job Post.xlsx "Application Support Analyst_Toronto_Job Post.xlsx" ``` It does grab each file name. – Maria Hernandez Aug 03 '23 at 15:23
  • Ah those are just file names so it is looking in your current working directory for them. You should probably specify `list.files(..., full.names = TRUE)` – LMc Aug 03 '23 at 15:57