2

Assume you have 2 files as follows.

file_1_october.csv
file_2_november.csv

The files have identical columns. So I want to read both files in R which I can easily do with map. I also want to include in each read file a column month with the name of the file. For instance, for file_1_october.csv, I want a column called “month” that contains the words “file_1_october.csv”.

For reproducibility, assume file_1_october.csv is

name,age,gender
james,24,male
Sue,21,female

While file_2_november.csv is

name,age,gender
Grey,24,male
Juliet,21,female

I want to read both files but in each file include a month column that corresponds to the file name so that we have;

name,age,gender,month
james,24,male, file_1_october.csv
Sue,21,female, file_1_october.csv

AND

name,age,gender,month,
Grey,24,male, file_2_november.csv,
Juliet,21,female, file_2_november.csv
John Karuitha
  • 331
  • 3
  • 11

2 Answers2

5

Maybe something like this?

csvlist <- c("file_1_october.csv", "file_2_november.csv")

df_list <- lapply(csvlist, function(x) read.csv(x) %>% mutate(month = x))

for (i in seq_along(df_list)) {
  assign(paste0("df", i), df_list[[i]])
}

The two dataframes will be saved in df1 and df2.

benson23
  • 16,369
  • 9
  • 19
  • 38
4

Here's a (mostly) tidyverse alternative that avoids looping:

library(tidyverse)

csv_names <- list.files(path = "path/", # set the path to your folder with csv files
                        pattern = "*.csv", # select all csv files in the folder
                        full.names = T) # output full file names (with path)
# csv_names <- c("file_1_october.csv", "file_2_november.csv")

csv_names2 <- data.frame(month = csv_names, 
                         id = as.character(1:length(csv_names))) # id for joining

data <- csv_names %>% 
  lapply(read_csv) %>% # read all the files at once
  bind_rows(.id = "id") %>% # bind all tables into one object, and give id for each
  left_join(csv_names2) # join month column created earlier

This gives a single data object with data from all the CSVs together. In case you need them separately, you can omit the bind_rows() step, giving you a list of multiple tables ("tibbles"). These can then be split using list2env() or some split() function.