1

I'm trying to apply the code here which uses ldply to combine multiple csv files into one dataframe

I'm trying to figure out what the appropriate tidyverse syntax is to add a column that lists the name of the file from which the data comes from.

Here's what I have

test <- ldply( .data = list.files(pattern="*.csv"),
              .fun = read.csv,
               header = TRUE) %>%
  mutate(filename=gsub(".csv","",basename(x)))

I get

"Error in basename(x) : object 'x' not found message".

My understanding is that basename(path), but when I set the path as the folder which contains the file, the filename column that ends up getting added just has the folder name.

Any help is much appreciated!

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
dnmc
  • 107
  • 8

3 Answers3

2

You could use purrr::map_dfr

purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE),
    ~read.csv(.x) %>% mutate(file = sub(".csv$", "", basename(.x))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks for this. I got a error for one of the columns saying it can't be converted from factor to numeric (but as far as i know that specific column is numeric to start with). Is there a parameter that needs to be set differently? Thanks again – dnmc Jun 12 '19 at 14:56
  • @RookieSnowbodah I just tried on few csv files on my system. It worked for me. but maybe you can try using `stringsAsFactors = FALSE` in `read.csv`. ? – Ronak Shah Jun 12 '19 at 16:02
  • just realized that the column that's giving me issue has some numbers and some "< than" #s, i.e "<10". When I tried abc <- purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE), + ~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(file = sub(".csv$", "", basename(.x)))) I get an error saying that column can't be converted from character to numeric. Is there a workaround for this? – dnmc Jun 13 '19 at 02:11
  • @RookieSnowbodah Strange, it should still work. Try to read 1 file first. Do `files <- list.files(pattern="*.csv", full.names = TRUE)` and then check are you able to read it with `read.csv(files[1])`. Also try creating a copy of the file and read it, see if there is any difference. – Ronak Shah Jun 13 '19 at 02:21
0

We can use imap

library(purrr)
library(dplyr)
library(stringr)
library(readr)
files <- list.files(pattern="*.csv", full.names = TRUE)
fileSub <- str_remove(basename(files), "\\.csv$")
imap_dfr(setNames(files, fileSub), ~ read_csv(.x) %>%
          mutate(file = .y))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I don't know if this helps anyone, I stumbled across this solution which is very simple.

Context: the .id column created by ldply lists the names of each item in your input vector. So, to combine multiple csv files and create a new column with the file name, you can do:

# get csv files in current working directory as a character vector
file_names <- list.files(pattern="*.csv") #for the example above it is .data=list.files(pattern="*.csv")

# Name these items (in this case equal to the items themselves, but can be subbed out for sample.Ids)
names(file_names) <- paste(file_names) # or for the example above names(.data) <- paste(.data)

# then use ldply to do the hard work
combined_csv <- ldply(file_names, read.csv)

# Names are stored under .id
combined_csv$.id
Benjamin Simpson
  • 71
  • 1
  • 1
  • 5