2

Hi I have one doubt in SSIS, I want load multiple csv files into SQL server table using SSIS package. while loading time we need consider data from headers on wards.

Source path have 3 csv files with fixed header columns with data but each file have file desciption and dates creation information before headers and one file description comes 2row and headers row start from 4th row with data. Another file description comes from 1 row and 9 row on wards have headers with data and another file will come file description from 5 row and headers row start from 7th row. Columns headers are fixed in the all csv files

Files location :

  1. C:\test\a.csv
  2. C:\test\b.csv
  3. C:\test\c.csv

a.csv file data like below :

here descritpion and dates comes 2and 3 row.actual data start from 4th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-20

id |name|loc
1  |a   |hyd

b.csv file data like below :

here descritpion and dates comes 1and 2 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21

id |name|loc
10  |b   |chen

c.csv file data like below :

here descritpion and comes 5 and 6 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21

id |name|loc
20  |c   |bang

Based on above 3 file I want load data into target sql server table emp :

id  | Name |Sal
1   |a     |hyd
2   |b     |chen
3   |c     |bang

here I tried like below in the package side:

  1. create variable :
    • filelocationpath: C:\test\
    • filename : C:\test\a.csv
  2. drag and drop the for-each loop container :
    • choose the type of enumerator for-each file enumerator
    • directory: c:\test
    • variable mapping :filename configure it.
    • type of file: *.csv
    • retrieve filename: filename and extension
  3. Inside for-each loop container I drag and drop the data-flow task and create flat file connection, here used one of file is configure and header row skipped is 1 and used data conversion required column and configure to OLE DB destination table and create dynamic connection expression for flat-file connection to pass filename dynamically.

After executing the package 2nd file is failed due to description and dates information:

  • description and dates is not constantly comes fixed rows next day files
  • description and dates will comes with different rows

Is there possible to find dynamical how many row will skip and that count will pass in header row skip.is it possible in SSIS.

Please tell me how to achieve this task in SSIS

Yahfoufi
  • 2,220
  • 1
  • 22
  • 41
gbalu
  • 377
  • 4
  • 11
  • 2
    Can you get a native English speaker to help you re-write your question? I can't understand what you're trying to ask. – Tab Alleman Sep 23 '19 at 15:11
  • 1
    It is probably easiest just to remove the 3 lines from the file using a simple command script – Hogan Sep 24 '19 at 12:54
  • @TabAlleman Seems that the OP needs to work with CSV files that each one has a variable amount of "free text" rows at the start, before the actual header + content of the CSV. – EzLo Sep 24 '19 at 13:06
  • 1
    I'd recommend clearing the non-CSV top rows with a simple Script Compontent Task **before** the actual parsing of the file on your Data Flow. – EzLo Sep 24 '19 at 13:08

3 Answers3

0

If you have constantly count of rows which you should skip then try to go on utube and find this video: Delete Top N Rows from Flat File in SSIS Package.
In case you still need to find that amount and you don't know it that try to write into variable the amount for useless rows and then that value paste for processing package.

Andriy
  • 123
  • 1
  • 9
0

Workaround

  1. In the Flat File connection manager uncheck the read header from first row option, then go to the advanced tab and define the columns metadata manually (column name, length ...)
  2. Within the Data Flow Task, add a script component
  3. In the Script Component Editor, go to the Input and Output Tab and add an Output column of type boolean
  4. In the script editor, keep checking if the first column value is equal to the column header, while this condition is not met always set the output column value to false, when the column value is equal to the column header then set the output column value for all remaining rows to True
  5. Next to the Script component, add a Conditional split to filter row based on the generated column value (rows with False value must be ignored)
Hadi
  • 36,233
  • 13
  • 65
  • 124
0

Create a new file connection with a single column for the same file.

Add a Data flow task with a transformation script component.

Attach to the script component a readwrite variable as index (skiprows on the example code) and check the first characters of each row in the process input row.

bool checkRow;
int rowCount;
public override void PreExecute()
{
    base.PreExecute();
    checkRow = true;
    rowCount = 0;
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    if (checkRow)
    {
        rowCount++;
        if (Row.Data.StartsWith("id |"))
            checkRow = false;
    }
}
public override void PostExecute()
{
    base.PostExecute();
    Variables.skiprows = rowCount;//set script variable
}

Then you just need to set your variable in the expression 'HeaderRowsToSkip' for the original flat file connection.

If the files are going to be very large, you can force the script to fail when you had found the first row (zero division for example). Add an error event and set the system variable "Propagate" to false (@[System::Propagate]=false).

Borja
  • 1