combining ldply to combine multiple csv files AND add column with file names via mutate/basename

Question

I'm trying to apply the code here which uses ldply to combine multiple csv files into one dataframe

I'm trying to figure out what the appropriate tidyverse syntax is to add a column that lists the name of the file from which the data comes from.

Here's what I have

test <- ldply( .data = list.files(pattern="*.csv"),
              .fun = read.csv,
               header = TRUE) %>%
  mutate(filename=gsub(".csv","",basename(x)))

I get

"Error in basename(x) : object 'x' not found message".

My understanding is that basename(path), but when I set the path as the folder which contains the file, the filename column that ends up getting added just has the folder name.

Any help is much appreciated!

score 2 · Accepted Answer · answered Jun 12 '19 at 07:41

2

You could use purrr::map_dfr

purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE),
    ~read.csv(.x) %>% mutate(file = sub(".csv$", "", basename(.x))))

answered Jun 12 '19 at 07:41

Ronak Shah

377,200
20
156
213

Thanks for this. I got a error for one of the columns saying it can't be converted from factor to numeric (but as far as i know that specific column is numeric to start with). Is there a parameter that needs to be set differently? Thanks again – dnmc Jun 12 '19 at 14:56
@RookieSnowbodah I just tried on few csv files on my system. It worked for me. but maybe you can try using `stringsAsFactors = FALSE` in `read.csv`. ? – Ronak Shah Jun 12 '19 at 16:02
just realized that the column that's giving me issue has some numbers and some "< than" #s, i.e "<10". When I tried abc <- purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE), + ~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(file = sub(".csv$", "", basename(.x)))) I get an error saying that column can't be converted from character to numeric. Is there a workaround for this? – dnmc Jun 13 '19 at 02:11
@RookieSnowbodah Strange, it should still work. Try to read 1 file first. Do `files <- list.files(pattern="*.csv", full.names = TRUE)` and then check are you able to read it with `read.csv(files[1])`. Also try creating a copy of the file and read it, see if there is any difference. – Ronak Shah Jun 13 '19 at 02:21

score 0 · Answer 2 · answered Jun 12 '19 at 14:14

We can use imap

library(purrr)
library(dplyr)
library(stringr)
library(readr)
files <- list.files(pattern="*.csv", full.names = TRUE)
fileSub <- str_remove(basename(files), "\\.csv$")
imap_dfr(setNames(files, fileSub), ~ read_csv(.x) %>%
          mutate(file = .y))

score 0 · Answer 3 · answered Feb 12 '21 at 11:40

I don't know if this helps anyone, I stumbled across this solution which is very simple.

Context: the .id column created by ldply lists the names of each item in your input vector. So, to combine multiple csv files and create a new column with the file name, you can do:

# get csv files in current working directory as a character vector
file_names <- list.files(pattern="*.csv") #for the example above it is .data=list.files(pattern="*.csv")

# Name these items (in this case equal to the items themselves, but can be subbed out for sample.Ids)
names(file_names) <- paste(file_names) # or for the example above names(.data) <- paste(.data)

# then use ldply to do the hard work
combined_csv <- ldply(file_names, read.csv)

# Names are stored under .id
combined_csv$.id

combining ldply to combine multiple csv files AND add column with file names via mutate/basename

3 Answers3