1

I have 350+ CSV files I need to merge into one aggregate file. Using Power Query for Excel, I was able to easily merge all of the files in a few minutes. However, after inspection, not all of the files follow the same column order or name. For example, some files go by:

first, last, address, city, state, username

But some other files goes by:

address1, address2, city, state, last_name, first_name, age

Using Power Query, Excel, VBA, or something else - is there a way to effectively merge multiple multiple CSV files with different column names and column order? I tried searching online, but none of the solutions are related to what I was looking for (example).

The end result with the example above should look like this:

username, last_name, first_name, age, address1, address2, city, state

Any extra columns that don't exist in one sheet can be blank (or null) and the finalized column order doesn't matter because that can be rearranged later.

Ethan O'Sullivan
  • 323
  • 8
  • 19

1 Answers1

3

Similar to this related post, the Table.Combine function should do the trick, assuming you have headers to match columns with.

Load the CSV files in via Load from Folder, convert CSV to table format and promote headers, then combine with Table.Combine.

let
    Source = Folder.Files("C:\Path\To\Folder\With\CSVs"),
    #"Added Custom" = Table.AddColumn(Source, "CSV", each Table.PromoteHeaders(Csv.Document([Content]))),
    #"Combine CSVs" = Table.Combine(#"Added Custom"[CSV])
in
    #"Combine CSVs"
Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
  • The `Table.Combine` looks like something I need. I ran the query you provided and it looks like it works as intended when I look at the preview. However, when I save and load the newly merged data, it's still loading and count 30+ million rows so far. I know I only have a couple of 100k+ rows altogether. Any reason why it's generating so much? – Ethan O'Sullivan Nov 24 '20 at 23:20
  • I don't know what's going on with that unless it's picking up other files in that folder you don't intend to process. If you filter the list of files down to a handful before `#"Added Custom"`, does it act like you expect? – Alexis Olson Nov 24 '20 at 23:25
  • The folder only has the CSVs that I am merging. I did it with just 5 files and it came out as expected. Yet, if I do it to all 350 files in the same folder, it goes to the millions. Is there a limit? I added your formula in the Add Column > Custom Column. After I run it, I expand the new custom column. Then, I preview all of the columns and it looks like the expected outcome both times in the preview – Ethan O'Sullivan Nov 24 '20 at 23:44
  • When I went back and manually counted the total amount of records for a small batch of files, it would be 6,000. But when I run the query for the same small batch of files, it's 60,000. So I looked at the table and it's being duplicated. I removed the dupes and then it finally matches the total amount that I manually counted. Any way t prevent duplicates? – Ethan O'Sullivan Nov 24 '20 at 23:55
  • I haven't been able to reproduce the duplication behavior you're describing, so I can't tell what's going on. – Alexis Olson Nov 25 '20 at 02:31
  • Looks like I was able to solve it actually. Before I hit Close & Load after your formula, I selected all columns and went to **Home > Remove Rows (dropdown) > Remove Duplicates**. Now I'm seeing the expected result. Thanks. – Ethan O'Sullivan Nov 25 '20 at 14:26