1

I have a folder of many CSV files in which the first eight columns have the same header, but varying amounts of columns after the first eight. I am attempting to use Queries and Connections in Excel 2016 to complete this.

Here is an example of the format of the CSV files (pretend there are string values in A - H):

----------------------------------------------------------------------------------------------------------------
|     A     |  B  |  C  |  D  |  E  |  F  |  G  |  H  | Company 1 | Value (1) | Company 2 | Value (2) | etc... |  
----------------------------------------------------------------------------------------------------------------  
| Product 1 |     |     |     |     |     |     |     |    .05    |   25.00   |    .08    |   14.00   |        |  
----------------------------------------------------------------------------------------------------------------  
| Product 2 |     |     |     |     |     |     |     |    .16    |   43.00   |    .06    |   18.00   |        |  
----------------------------------------------------------------------------------------------------------------  

Once again: Columns A - H are the same for each CSV file, but each file has a different amount of Companies/Values (the company names are different for each file).

I have done the necessary steps for one CSV file and was hoping there was some way I could possibly use it as a template for the others.

Here are the steps I need to do in Excel's Query editor:

  • Delete some of "A-H" columns (these headers match for all files, shouldn't be hard)
  • Delete all "Value (#)" columns - the number of these is different in each CSV file
  • Unpivot all the "Company #" columns (now is every column after matching columns A - H because "Value (#)" columns were deleted in previous step)
  • Rename column with all company names to "Company"
  • Rename column with all values previously under each company to "Cost"


Here is the applied steps in the 'Advanced Query Editor':

#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"A", type text}, {"B", type text}, {"C", type text}, {"D", type text}, {"E", type number}, {"F", type text}, {"G", Currency.Type}, {"H", Currency.Type}, {"Company 1", type text}, {"Value (1)", Currency.Type}, {"Company 2", type text}, {"Value (2)", Currency.Type}, {"Company 3", type text}, {"Value (3)", Currency.Type}, {"Company 4", type text}, {"Value (4)", Currency.Type}, {"Company 5", type text}, {"Value (5)", Currency.Type}, {"Company 6, type text}, {"Value (6)", Currency.Type}, {"Company 7", type text}, {"Value (7)", Currency.Type}, {"Company 8", type text}, {"Value (8)", Currency.Type}, {"Company 9", type text}, {"Value (9)", Currency.Type}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"A", "C", "D", "E", "Company 1", "Company 2", "Company 3", "Company 4", "Company 5", "Company 6", "Company 7", "Company 8", "Company 9"}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Removed Other Columns", {"A", "C", "D", "E"}, "Attribute", "Value")

Is there any way to import the folder of CSV files and apply the steps stated above to each one so there is just one query of all the CSV files? (brief example of how I want it to look below)

----------------------------------------------------
|     A     |  C  |  D  |  E  |  Company  |  Cost  |  
----------------------------------------------------
| Product 1 |     |     |     | Company 1 |  .05   |
----------------------------------------------------
| Product 1 |     |     |     | Company 2 |  .08   |
----------------------------------------------------
| Product 2 |     |     |     | Company 1 |  .16   |
----------------------------------------------------
| Product 2 |     |     |     | Company 2 |  .06   |
----------------------------------------------------
Grant Meehan
  • 23
  • 2
  • 16

1 Answers1

3

I believe so. This seems to work.

Start by selecting Get Data (or New Source) > File > Folder and browse to your folder with the csv files in it and select it and click OK. Then click OK again.

You'll see something like the following.

enter image description here

Click the Edit button.

Right-click the Content column and Remove Other Columns. You'll see something like this.

enter image description here

Expand the Content column by clicking the button at its top right and click OK in the dialog box that appears, like below.

enter image description here

On the left side of your screen, under Queries, You'll see something like this:

enter image description here

The query that I've highlighted is where you want to do your work. Note that it has a table icon and starts with Transform Sample File. Click on it.

Under APPLIED STEPS on the right side of the screen, click on Source.

In the formula bar, you should see something like this: = Csv.Document(#"Sample File Parameter1",[Delimiter=",", Columns=12, Encoding=1252, QuoteStyle=QuoteStyle.None]). Delete the part, Columns=12, (everything from the C through and including the comma--it most likely has a different number than 12).

enter image description here

Then remove the columns you want removed from your table.

enter image description here

Click Transform > Use First Header Row as Headers dropdown arrow > Use Headers as First Row.

enter image description here enter image description here

Click Transform > Transpose to flip the table.

Select the Column1 dropdown arrow > Text Filters > Does Not Begin With > and Type Value and click OK.

enter image description here

Click Transform > Transpose to flip the table back to its original orientation.

Click Transform > Use First Row as Headers.

Select all of the columns except your Company columns and click the Unpivot Columns dropdown arrow > Unpivot Other Columns. (Don't select and pivot the Company columns directly. Doing it indirectly, like I said, should capture the varying numbers of company columns from the various files.)

enter image description here

Under APPLIED STEPS on the right side of the screen, delete each Changed Type step. If you don't, they can cause your query to choke.

Now go back to this query, which would've been your original "main" query. Of course, yours is named something else, but you'll know it by its location. Click on it.

enter image description here

You'll likely see an error like this.

enter image description here

Delete any Changed Type steps in the APPLIED STEPS on the right side of the screen.

You should see something like this.

enter image description here

Marc Pincince
  • 4,987
  • 6
  • 19
  • 40
  • This worked perfectly! I found some other issues with random necessary columns that I didn't put in the initial post but I just incorporated those issues into the step where you filter the rows. You saved me from doing the transformations of 850+ queries individually. – Grant Meehan Jun 14 '18 at 16:54