1

I want to merge columns of multiple xlsx files (about 100) which all have the same structure, meaning the all have 5 rows and 2 columns.

I'm using the answer provided here (with a slight edit to merge the columns instead of rows)

library(purrr)
library(writexl)


files <- list.files(pattern="*.xlsx")
mainDF <- files %>% map_dfc(read.xlsx) 

write_xlsx(mainDF, "merge.xlsx")

but what I end up with is that the first row of every sheet is now the column name.

How do I keep the original column names in the merged file?

Also in the merged file there are new columns named "file" which display the file name, I would like to remove those as well.

Any help would be appreciated :)

EDIT

Example: I have file1.xlsx, file2.xlsx looking like

Data col1              Data  col2
 x    1                 x     4
 y    3                 y     6

and my output is

1 4
3 6

but my goal is to have mergefile.xlsx

Data col1 col2
  x   1    4
  y   3    6
maryam
  • 101
  • 5
  • Is the first row of every file blank? – Dave2e Jul 17 '21 at 14:58
  • The first row of the single files contains the column name, but after merging the column names are numbers (data used in the single files) – maryam Jul 17 '21 at 15:00
  • are you horizontally binding these files? I mean adding columns horizontally? – monte Jul 17 '21 at 15:11
  • @monte Yes, I guess another way to say it would be to take only the second column of every file and append it to the first file (if that makes sense) – maryam Jul 17 '21 at 15:14
  • 2
    I suggest you use `map` instead of `map_dfc` so that your `mainDF` is a list whose elements are your excel tables and then use `mainDF %>% reduce(~ full_join(..1, ..2, by = "Data"))` to bind them. – Anoushiravan R Jul 17 '21 at 15:16
  • So if you were to add a 3rd file, will that show as `col3` in final dataframe? – monte Jul 17 '21 at 15:16
  • I would try using the `bind_cols` function from the dplyr package and avoid using the `map_dfc()` function. – Dave2e Jul 17 '21 at 15:24

1 Answers1

1

I think your problem is that read.csv doesn't set colnames of the dataframe that it produces to the headers of the input file. You can use readr::read_csv to get this behavior.

library(dplyr)
library(readr)

files <- list.files(pattern="*.csv")
dfs <- lapply(files, read_csv)

combined_df <- Reduce(function(x, y) full_join(x, y, by = "Data"), dfs)

write_xlsx(combined_df, "merge.xlsx")
Akindele Davies
  • 391
  • 1
  • 5