0

I'm new to SSIS Development. I need some guidance from experts on SSIS. Following are the list of questions :

We are having files with sizes from 1GB to 25 GB of type txt or dat files with tab delimited . Some times,these file might contain invalid rows, invalid data types or large of amount of string value which results in failure while importing file data into SQL Tables

Can any one provide use how to perform Data Profiling on the File before loading into actual SQL Tables like

  • No of Invalid Row which are having NULL values in a columns No of Rows present in Files No of Row with wrong data type value in a column for example: there are chance of file holding
    string data in int field
  • Some time Row Delimiter in between columns for example : If there are 10 columns in a files, If there is row delimiter in between 5-6 column, which leads to current row data from 6 - 10 columns moved to next row. This scenario should be checked in Data Profiler.
    • Is there way in SSMS to identify all the columns in each row are of Equal Column length. Incase if it not present throw error. But identifying those rows needs to be faster
user145610
  • 2,949
  • 4
  • 43
  • 75
  • very hypothetical situation! Needs in depth Analysis to provide an appropriate feedback. – Murtaza May 08 '14 at 05:58
  • Define 'Invalid row'. I suggest you first import all rows into a single `VARCHAR` field to check equal column length. If it passes you can go to the next step of importing into multiple `VARCHAR` fields in a table and perform checks on them using things like `IS NULL`, `IS_NUMERIC`, `LEN` etc. Exactly which version of SQL? as SQL Server 2012 has some advanced faetures. The other way to do it is write an external console application that first checks the files using string functions and puts it into a pass or fail folder. – Nick.Mc May 08 '14 at 09:50

2 Answers2

0

I would import text files into SQL tables with all columns set to NVARCHAR 4000. IMO this is best practise even in production - you can never trust text files.

Having done that you can point the SSIS Data Profiler tool at the tables. With minimal effort it will give you a rich look at the data in the files.

Note SSIS cannot handle your issue with row delimiters inside columns. If your source files are of such poor quality I would abandon the SSIS Flat File Source and consume the data using a Script Task as a source, with carefully crafted Split functions etc to try to parse the files. This is not a trivial task.

Good luck!

Mike Honey
  • 14,523
  • 1
  • 24
  • 40
0

When you open a Data Flow Task, instead of loading the data to an existing table, you create the new table inside your destination task. It will automatically create a table with the exact same datatypes/sizes from your source file. After that, you can transform your data and manipulate it as you wish and then you load it to your final table.

Newton
  • 21
  • 1