0

This is a tricky dplyr & purrr question I want to simplify the following code into one dplyr pipe:

filenames <- list.files(path = data.location, pattern = "*.csv") %>%
  map_chr(function(name) gsub(paste0('(.*).csv'), '\\1', name))

files.raw <- list.files(path = data.location, pattern = "*.csv", full.names = TRUE) %>%
  map(read_csv) %>%
  setNames(filenames)

I tried to do this solution but it failed because the filenames must be used with full path (full.names = TRUE) for read_csv() but I want to assign the filenames without the full path.

In other words, this worked - but only with full path in filenames:

list.files(path = data.location, pattern = "*.csv", full.names = TRUE) %>%
  { . ->> filenames } %>%
  map(read_csv) %>%
  setNames(filenames)

but this didn't:

list.files(path = data.location, pattern = "*.csv", full.names = TRUE) %>%
{ map_chr(., function(name) gsub(paste0(data.location, '/(.*).csv'), '\\1', name)) ->> filenames } %>%
  map(read_csv) %>% 
  setNames(filenames)

Is there a way to make the map_chr work with the save (->> filenames), or is there an even simpler way to completely avoid saving to a temporary variable (filenames)?

Agile Bean
  • 6,437
  • 1
  • 45
  • 53

4 Answers4

2

To do it in one pipeline without intermediate values, and similar to @Ronak Shah, why not set the names first, then read in the CSVs? Ronak nests the setNames call, but it can be put it in the pipeline to make it more readable:

library(tidyverse)
list.files(path = data.location, pattern = "*.csv", full.names = TRUE) %>%
    setNames(., sub("\\.csv$", "", basename(.))) %>% 
    map(read_csv)
pgcudahy
  • 1,542
  • 13
  • 36
  • thanks but this doesn't work (tried that before) because by setting the names, the file path is removed by `basename()` so `read_csv` doesn't find the files – Agile Bean Sep 23 '19 at 16:17
  • data.location <- "~/Desktop/csvs"; list.files(path = data.location, pattern = "*.csv", full.names = TRUE) %>% setNames(., sub("\\.csv$", "", basename(.))) %>% map(print) $a'/Users/pgcudahy/Desktop/csvs/a.csv' $b'/Users/pgcudahy/Desktop/csvs/b.csv' $c'/Users/pgcudahy/Desktop/csvs/c.csv' – pgcudahy Sep 24 '19 at 05:34
  • Ok! @pccudahy I mistakenly thought I had tried your solution. Now that I tried it, I admit it is better than my solution because it is shorter and more intuitive. So I mark this as solution now. – Agile Bean Sep 28 '19 at 16:59
1

Try using this method :

all_files <- list.files(path = data.location, pattern = "*.csv", full.names = TRUE) 

purrr::map(all_files, readr::read_csv) %>%
      setNames(sub("\\.csv$", "", basename(all_files)))

Here, we first get complete path of all files, use it to read it using read_csv. We can use basename to get only file name and remove "csv" from it and assign names using setNames.


To do this in one-pipe, we can do

list.files(path = data.location, pattern = "*.csv", full.names = TRUE)  %>%
    {setNames(map(., read_csv), sub("\\.csv$", "", basename(.)))}
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • this is indeed a smart way to reuse the file paths but can you make this into one dplyr pipe? – Agile Bean Sep 21 '19 at 15:11
  • @AgileBean It makes it a bit ugly but I have updated the answer to do this in one pipe. – Ronak Shah Sep 22 '19 at 01:32
  • I see your idea, and you're right, it gets uglier than your first solution. It seems to be impossible to make it more readable. I will try though... – Agile Bean Sep 22 '19 at 08:33
1

We can do this with only tidyverse functions

library(readr)
library(purrr)
library(dplyr)
all_files <- list.files(path = data.location, pattern = "*\\.csv", full.names = TRUE) 
 map(all_files, read_csv) %>%
          set_names(str_remove(basename(all_files), "\\.csv$")) 
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Inspired by

  1. the below answer by @Ronak Shah
  2. the intermediary assignment suggested by @G. Grothendiek here

I put together the following solution which

  1. combines the reading and naming of files into one dplyr pipe (my initial goal)
  2. makes the code more intuitive to read (my always implicit goal) - important for collaborators or in general.

So here's my one-in-all dplyr pipe:

list.files(path = data.location, pattern = "*.csv", full.names = TRUE)  %>%
  { filenames <<- . } %>%
  map(read_csv) %>%
  setNames(filenames %>% basename %>% map(~gsub(paste0('(.*).csv'), '\\1', .x)))

The only tricky part in the above solution is to use the assignment operator <<- as <- would not work. If you want to use the latter, you can do so by putting the whole second section into brackets - also suggested by @G. Grothendieck in the above post:

list.files(path = data.location, pattern = "*.csv", full.names = TRUE)  %>%
{
  { filenames <- . } %>%
    map(read_csv) %>%
    setNames(filenames %>% basename %>% map(~gsub(paste0('(.*).csv'), '\\1', .x)))
}
Agile Bean
  • 6,437
  • 1
  • 45
  • 53