5

I am getting a basic problem in R. I have to merge 72 excel files with similar data type having same variables. I have to merge them to a single data set in R. I have used the below code for merging but this seems NOT practical for so many files. Can anyone help me please?

data1<-read.csv("D:/Customer_details1/data-01.csv")

data2<-read.csv("D:/Customer_details2/data-02.csv")

data3<-read.csv("D:/Customer_details3/data-03.csv")

data_merged<-merge(data1,data2,all.x = TRUE, all.y = TRUE)

data_merged2<-merge(data_merged,data3,all.x = TRUE, all.y = TRUE)
pogibas
  • 27,303
  • 19
  • 84
  • 117
aninditab
  • 51
  • 1
  • 1
  • 3
  • if my solution helped to solve your problem you can accept the answer so we can close the question :) – pogibas Sep 22 '17 at 21:34

2 Answers2

5

First, if the extensions are .csv, they're not Excel files, they're .csv files.

We can leverage the apply family of functions to do this efficiently.

First, let's create a list of the files:

setwd("D://Customer_details1/")

#  create a list of all files in the working directory with the .csv extension
files <- list.files(pattern="*.csv")

Let's use purrr::map in this case, although we could also use lapply - updated to map_dfr to remove the need for reduce, by automatically rbind-ing into a data frame:

library(purrr)

mainDF <- files %>% map_dfr(read.csv) 

You can pass additional arguments to read.csv if you need to: map(read.csv, ...)

Note that for rbind to work the column names have to be the same, and I'm assuming they are based on your question.

Mako212
  • 6,787
  • 1
  • 18
  • 37
  • 1
    You can use `map_df` and avoid the `reduce` step. – beigel Sep 19 '17 at 17:14
  • No problem, it looks like you made a small typo though, listing `map_dfr` instead of `map_df`. You might wanna fix it so as to not confuse anyone. – beigel Sep 19 '17 at 23:12
  • @beigel Nope, if you look at `?purrr` you'll see that `map_dfr` is the `rbind` version, and `map_dfc` is the `cbind` version. Looking at the underlying formulas `map_df` and `map_dfr` appear to be defined exactly the same (i.e. using `rbind`) though, so you're not wrong that `map_df` is also correct. – Mako212 Sep 20 '17 at 15:14
  • My bad. `map_df` doesn't appear in the docs while `map_dfr` and `map_dfc` do, so I'm guessing the latter two are the preferred way to invoke `map_` since they're more explicit. – beigel Sep 20 '17 at 20:40
  • @aninditab Do you have both `purrr` and `dplyr` installed? – Mako212 Sep 20 '17 at 20:48
  • @Mako212 Is there a way to use this method and only map_dfr certain column names in all the files? I have a lot of excel files that have some columns that I want but there are some files that have columns that other files don't have... Any idea how I can do this? Thanks. – NBE Nov 14 '18 at 17:36
  • @KWANGER I'd recommend asking a new question for this, because there's more subtlety to this than I can address in a comment. In broad terms, you can either select a common subset of columns before combining into a single table, or create NA values for missing columns in the smaller files, preserving all the data. – Mako212 Nov 14 '18 at 17:44
0
#Method I
library(readxl)
library(tidyverse)
path <- "C:/Users/Downloads"
setwd(path)
fn.import<- function(x) read_excel("country.xlsx", sheet=x)
sheet = excel_sheets("country.xlsx")
data_frame = lapply(setNames(sheet, sheet),  fn.import )
data_frame = bind_rows(data_frame, .id="Sheet")
print (data_frame)
#Method II
install.packages("rio")
library(rio)
path <- "C:/Users/country.xlsx"
data <- import_list(path , rbind=TRUE) 
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 09 '21 at 05:37