11

I retrieved a list of csv files with purrr::map and got a large list.

  csv_files <- list.files(path = data_path, pattern = '\\.csv$', full.names = TRUE)
  all_csv <- purrr::map(csv_files, readr::read_csv2)
  names(all_csv) <- gsub(data_path, "", csv_files)
  return all_csv

EDITED as suggested by @Spacedman

I further need to process each tibble/data frame separately within the process_csv_data function.

purrr::map(all_csv, process_csv_data)

How to retrieve the name of a single item in the large list without for loop?

Yann
  • 887
  • 4
  • 12
  • 20
  • Like `names(all_csv)[42]` for example? – Spacedman Oct 24 '17 at 12:05
  • 1
    Also, use `basename(csv_files)` to get the file name part of the path. `gsub` fails if `data_path` is `"."`, which it was when I tried this. – Spacedman Oct 24 '17 at 12:06
  • @Spacedman Is it the reason for the downvote? As I said, I'm avoiding a for loop and therefore I shouldn't have an index to use the bracket operator [. – Yann Oct 24 '17 at 12:07
  • I think you should say *within the process_csv_data function* for clarity. – Spacedman Oct 24 '17 at 13:17

2 Answers2

21

Use map2, as in this reproducible example:

> L = list(a=1:10, b=1:5, c=1:6)
> map2(L, names(L), function(x,y){message("x is ",x," y is ",y)})
x is 12345678910 y is a
x is 12345 y is b
x is 123456 y is c

the output of the list as x in the function gets a bit munged by message, but its the list element of L.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • 13
    `imap` was devised to make these usages of `map2` sexier, your answer can be simplified into : `imap(L,~message("x is ",.x," y is ",.y))` – moodymudskipper Oct 27 '17 at 09:16
  • see also `lmap`, that allows you to loop on `list-elements` (sublists of length 1) : `lmap(L,~ {message("x is ",.x[[1]]," y is ",names(.x));return(list(NULL))})` – moodymudskipper Oct 27 '17 at 09:22
6

You can take advantage of purrr to keep all the data in a single, nested tibble. That way each csv and processed csv remains linked directly with the appropriate csv-name:

csv_files <- list.files(path = data_path, pattern = '\\.csv$', full.names = TRUE)

all_csv <- tibble(csv_files) %>% 
    mutate(data = map(csv_files, read_csv2),
    processed = map(data, process_csv_data),
    csv_files = gsub(data_path, "", csv_files)) %>%
    select(-data)
David Klotz
  • 2,401
  • 1
  • 7
  • 16
  • 1
    This works fine but it's a bit hard to retrieve the data by `all_csv$processed$name_of_file`. Alternatively, you can create a list of tibbles that can be accessed directly: `all_csv <- list.files(path = data_path, pattern = "*.csv", full.names = TRUE) %>% map(read_csv) %>% setNames(csv_files)` So can get a file just by `all_csv$name_of_file` – Agile Bean Aug 08 '19 at 04:06