1

I'm attempting to load files into a SQL Server database. I have familiarity with using bulk import and generating format files using BCP to reorganize columns. However, I'm attempting to do 2 new things that I've never tried before.

I've been given scores of separate CSVs to load into a database, individual files each representing a day over the course of a few years. I'd like to be able to use a utility to load them, but they have a varying number of columns (and of course, are separate files!). I was thinking about creating a loading stored procedure, and iterating over the files using PowerShell, but even then I can't come up with a good way to handle the varying number of columns.

The columns are consistent, it's just that the files have had columns added to them over the years. I'd like to be able to import the files with less columns, having the non-existent columns come in as null values. Does BCP support this?

I'd rather not have to figure out when each column was added and write different stored procedures/scripts for each group of dates (that's the only solution I can come up with!). Any suggestions?

Thanks!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Dear future visitors - one wonderful tool to solve this problem is Pandas. You can import all the files into a data frame then ship that to your database with ease. – Philippe Haussmann Apr 14 '22 at 22:50

2 Answers2

1

The only way I can think of to handle this is with procedural coding. You could write either a console app, or an SSIS package with a script task. Either way, you would need to loop through the files, and inside the loop, examine the contents to see which columns are missing, and then import the file supplying some hard-coded value for any columns that are missing.

BCP can handle files with missing columns by using a format file, but it can't do it dynamically (not knowing in advance which columns are missing).

Tab Alleman
  • 31,483
  • 7
  • 36
  • 52
  • 1
    Ok. I think I can make this work by using powershell's import-CSV to add blank columns and BCPs format files to take in those blank columns as nulls. Thanks for the suggestion! – Philippe Haussmann Oct 11 '18 at 20:31
0

For those who follow-

I solved this riddle by using Powershell to concatenate all the text files into a single, large text file (because we were only loading them all into one DB table anyway). Once I had a single file to upload, I created a Flat-File to OLE-DB connection program in SSIS, and used it's Data Conversion and Derived Column tasks to convert values to required values for specified fields. SSIS was a very speedy method to upload the data once packaged, blazing through millions of lines in a matter of seconds. Some considerations if you elect to go that route:

If you have a substantial number of text files to upload and you elect to concatenate them, know that the file may become too large to read directly. SSIS has 'data viewer' tools which can be useful for troubleshooting, and those will allow you to view the files contents regardless of it's size (as you'll be reading it 1,000 lines at a time). However, obviously, this will seriously cut into your speed, so it's probably a good idea to test your script first, and then work on the SSIS package.

Good luck!