0

I'm trying to transfer data from Excel into an SQL Server database. Each Excel file has a sheet for each month, and each of these sheets has the following structure where, for every "Production Unit", the values for the fields are stored for each day of the month. The field names are exactly the same for all.

... (Emptry/Irrelevant Rows)
Unit 1 |       |      1|      2|      3|      4|      5| ...
Field1 |KG     |   95.2| 254.12|  125.0|  125.0|  125.0| ...
Filed2 |CM     |   95.2| 254.12|  125.0|  125.0|  125.0| ...
... (Other Fields)
... (Emptry/Irrelevant Rows)
Unit 2
Field1 |KG     |   95.2| 254.12|  125.0|  125.0|  125.0| ...
Filed2 |CM     |   95.2| 254.12|  125.0|  125.0|  125.0| ...
... (Other Fields)
... (Emptry/Irrelevant Rows)
Unit 3
...

There is a separate Excel file for each year. I would like to get all this data into the database in the following format:

Unit |Date       |Field1 |Field2 | ...
1    |18/02/2008 |   23.5|  12.56| ...
1    |19/02/2008 |   23.5|  12.56| ...
...
3    |01/05/2012 |   23.5|  12.56| ...
...

I know how to take a specific segment of a sheet and then unpivot/pivot to rotate the data, but before that, I have 2 issues.

One is to rename the column names from simple numbers into full dates before (or after?) I unpivot them. So some way of appending the month (maybe based on a case statement using the sheet name) and the year (maybe from a variable that I could change for every Excel file).

The second is how to bring in the Unit data separately. There are quite a few calculated cells and empty/irrelevant rows between the Units. I would either need to specify the range for each, or bring in all and then eliminate the rows in between (which might have to be based on row numbers, since I'm not sure what other logic would apply to all). But then how do I label them as Units 1/2/3/etc? There are 5.

I'm not that familiar with SSIS and I feel a bit out of my depth here. Should all this be done in a stored procedure? And if so, is there even a need for SSIS? If it would be in SSIS where would the procedure reside? Or is there an easier way of doing all this?

Serital
  • 343
  • 3
  • 13
  • SSIS can handle it but I'd look at using a Script source to clean up the ugliness. It's going to be the best fit for the procedural logic you need http://stackoverflow.com/questions/15980580/script-task-in-ssis-to-import-excel-spreadsheet – billinkc Nov 27 '13 at 02:55
  • I strongly suggest you do not store your data in that pivoted format. You should store it in the normalised format that it appears in the Excel file, then if you wish you can use a view to pivot it around with Field1, Field2 etc. as columns. There are many tools and methods you can use to import this, and usually the best one is the one you are most comfortable with (as you'll need to fix and enhance it as time goes by) – Nick.Mc Nov 27 '13 at 23:31
  • @ElectricLlama Are you suggesting I use a table structure where there is a "Year" column, a "Field" column and 365 columns for each day of the year? And enter the values for the fields per year in rows? I don't see how that would be useful for me (besides it doesn't solve the column naming problem unless I use monthly tables with 31 columns for dates). Also, I would have assumed normalised format is the one I mentioned where new daily data is entered into a new row, not into the next column? – Serital Nov 28 '13 at 13:34
  • Sorry I didn't look at your input file in detail. Definitely not a column for each day. Your target format looks OK. I would consider pivoting Field1 into a units column. Although if there are only a few units and they are unlikely to change and you have a lot of data then your target table definition is probably just fine. The way I would do this is import into a staging table matching the input file via SSIS or BCP or BULK INSERT, then use stored procedures to pivot it and clean it into your table. Just because I'm more comfortable using a T-SQL script than a .Net script. – Nick.Mc Dec 01 '13 at 22:03

0 Answers0