0

I am importing an Excel data extract into MSQuery using ODBC which has data that appears as below:

Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10   Col11
----------------------------------------------------------------------------
null   null   null   null   null   null   null   Units  Units  %Reach %Reach
Mkts   Dept   SCat   Cat    Seg   Brnd   UPC   4 W/E 10/06/17   4 W/E 11/03/17   4 W/E 12/01/17   4 W/E 02/02/17
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   3939493   231.11   883.43   49.13
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   5946942   422.32   222.64   91.84
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   4938843   543.34   null     null
CDE   Dept2  Cat2   BEV    NVEG   SAG    0549403   null     2        null
DEF   Dept3  Cat3   UTL    DARY   MUG    4032850   null     null     null

Sometimes the data files may contain extra null rows on top with some one of the starting cells having some text.

Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10   Col11
----------------------------------------------------------------------------
sumtxt null   null   null   null   null   null   null   null   null    null
null   null   null   null   null   null   null   null   null   null    null
null   null   null   null   null   null   null   Units  Units  %Reach %Reach
Mkts   Dept   SCat   Cat    Seg   Brnd   UPC   4 W/E 10/06/17   4 W/E 11/03/17   4 W/E 12/01/17   4 W/E 02/02/17
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   3939493   231.11   883.43   49.13
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   5946942   422.32   222.64   91.84
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   4938843   543.34   null     null
CDE   Dept2  Cat2   BEV    NVEG   SAG    0549403   null     2        null
DEF   Dept3  Cat3   UTL    DARY   MUG    4032850   null     null     null

Now, the row that is shown below is the Facts row:

null   null   null   null   null   null   null   Units  Units  %Reach %Reach

And the row that is below it is the Dimensions row:

Mkts   Dept   SCat   Cat    Seg   Brnd   UPC   4 W/E 10/06/17   4 W/E 

I want to somehow delete the top null rows, concatenate the Dimension rows with the Fact rows to get a single row. Then promote this row as the Header row. e.g.

Mkts   Dept   SCat   Cat    Seg   Brnd   UPC   Units~4 W/E 10/06/17   Units~4 W/E 11/03/17    %Reach~4 W/E 12/01/17   %Reach~4 W/E 02/02/17

Note: The Dimension row may vary and their names may be different in each data extract. Similarly, the Facts row may vary and their names may be different in each data extract.

Is this possible to do this transform in SQL, that too in MS Query, so that i get a clean table like this:

Mkts   Dept   SCat   Cat    Seg   Brnd   UPC   Units~4 W/E 10/06/17   Units~4 W/E 11/03/17    %Reach~4 W/E 12/01/17   %Reach~4 W/E 02/02/17
----------------------------------------------------------------------------
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   3939493   231.11   883.43   49.13
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   5946942   422.32   222.64   91.84
ABC   Dept1  Cat1   FOOD   VEGG   XWAR   4938843   543.34   null     null
CDE   Dept2  Cat2   BEV    NVEG   SAG    0549403   null     2        null
DEF   Dept3  Cat3   UTL    DARY   MUG    4032850   null     null     null
sifar
  • 1,086
  • 1
  • 17
  • 43
  • This is periodic process? Last 4 field names indicate non-normalized data structure and presume will change with each import. I doubt SQL statement can handle this. – June7 Sep 30 '19 at 22:39
  • Which would you prefer? Clean up the Excel data before it is imported? Or clean up the msaccess table after it has arrived? In each case, find the row having Col1 = "Mkts" and get the values of Col8, Col9, Col10, Col11. Delete all rows "Mkts" row and prior, OR Delete all rows Col7 is Null or = "UPC" (devise some reasonable deletion rule) . Rename each column to "Mkts", ... for cols 1 -- 7, and prefix "Units~" to cols 8-9, and "%Reach~" to cols 10-11. – donPablo Oct 01 '19 at 00:09
  • @donPablo the problem is Mkts or Units may not be present in every extract. – sifar Oct 01 '19 at 04:55
  • Is the row Units/Units/%Reach/%Reach present in each extract? With exactly those same spellings? Then use it to signal that the next row has the column heading names. – donPablo Oct 01 '19 at 15:46
  • No @donPablo, thats the problem! The facts may not be same across data extracts. – sifar Oct 01 '19 at 15:50
  • I would suggest fixing the data in excel before doing the MSQuery (msquery does not allow complex vba to be done before running the import sql). In Excel, the extra rows and the "column headers" are physically at the top of the sheet. Whereas, in MSAccess as a sql table, there is no inherent ordering of the rows or RowNum or RowID. – donPablo Oct 02 '19 at 15:22
  • If i dont use MSQuery but use ADO or SQL in MSAccess, would it be possible to write such an SQL? – sifar Oct 02 '19 at 15:25
  • SQL alone will not solve the problem of fixing the data; VBA is also needed. Yes, using ADO will allow creation of a new sheet with appropriate column headings, and copying only data rows to the new sheet. Then MSQuery can import from the new sheet. Look at for some initial ideas -- https://stackoverflow.com/questions/49832151/how-to-create-a-new-sheet-table-in-an-xlsx-file-using-ado-in-excel-vba – donPablo Oct 03 '19 at 16:39
  • Thanks @donPablo. I am aware of Adox and VBA. So probably will use VBA.:-) – sifar Oct 03 '19 at 17:22

1 Answers1

1

Rough outline--

' FindFolder that has the XLS files to import

' myFile = Dir *.xls

' Do While myFile <> ""

   ' Open the xls file

   ' if sheetName = "Fixed" then delete that sheet ' we will recreate it

   ' Select sheetName to import

   ' Activate that sheet

   ' Find Facts row and put values into one-based array FactsRow()
   ' Find Dimensions row and put values into one-based array DimenRow()
   ' Save row# of Dimensions row

   ' If ColHeaders ok (no Facts or Dimen rows), then 
      ' MSQuery import from Existing sheet to MSAccess
      ' jump to Dir stmt
   ' endif

   ' Create new sheet, and columns using FactsRow and DimenRow, per the following--
   ' https://stackoverflow.com/questions/49832151/how-to-create-a-new-sheet-table-in-an-xlsx-file-using-ado-in-excel-vba

   ' copy DataRows from DimenRowNum+1 thru end to Fixed sheet

   ' Save and close this XLS

   ' do MSQuery to import from Fixed sheet to MSAccess

   ' myFile = Dir  ' get filename of next xls file
' Loop  ' until all xls files processed
donPablo
  • 1,937
  • 1
  • 13
  • 18