0

I have an excel file .xlsx which i will get from client and after certain rows data it will have a blank row and then a generic comment everytime and the starting work of that phrase will be same all the time and not the whole phrase.

I want to execute ssis only till row 5 including header as columns and not want to process line 6 and 7 and every time data changes so i can't even take a range in this case.

I have the flexibility do it either in on prem SSIS or ADF.

Tried using filters but as the text in the cell is splitted in 4-5 lines it only ignores the 1st line.

I would appreciate any help.

enter image description here

enter image description here

TheSacredKiller
  • 121
  • 3
  • 8
  • It's all here https://www.mssqltips.com/sqlservertip/2176/dynamically-find-where-table-data-starts-in-excel-using-ssis/ - could you take this example and add an answer yourself below summarising the article to help others. – Jeremy Thompson Jun 09 '22 at 05:36
  • @JeremyThompson: I will give it a try but this article doesn't show where table data ends because in my case the data will start from first row every time. I will still give it a try and see if i can tweak the code – TheSacredKiller Jun 09 '22 at 06:01
  • Yes, I left that as an exercise for yourself. See how he does this `if ((row[0].ToString() == "HdrCol1")` to work out the starting cell, all you need to do for the ending cell is swap the foreach loop with one line `Dts.Variables["varTabName"].Value = tabName + "A1:C" + dt.Rows.Count - 2;` – Jeremy Thompson Jun 09 '22 at 06:18
  • You could read all 7 lines but add a `Conditional Split` that directs all lines containing _DOB_ to your destination and all lines with an empty _DOB_ to nowhere, the destination will then only receive the 4 required lines, only concern is if the client sends you _Names_ with empty _DOB's_ – Ockert Jun 09 '22 at 07:18

1 Answers1

1

Using Azure data factory data flow, you can use filter transformation to ignore certain rows based on conditions.

Example:

Excel input:

enter image description here

ADF Data flow:

  1. Create a source excel dataset. If your first row is not a header do not enable the First row as header property in the dataset. Here by default, the empty rows will be skipped while reading.

enter image description here

  1. Add source transformation and connect it to the excel dataset. If you do not have a header row in the data, the default column names (like col0) were given to your data.

enter image description here

  1. Add filter transformation after source, to filter out the unwanted rows.

    Expression: not(like({_col0_},'This%'))

enter image description here

Filter output:

enter image description here

Using SSIS, in your Excel source, you can use the SQL command and write a query to pull the data from excel. If you have any mandatory columns, use filters with where condition to pull not null rows from the file (ex: SELECT * FROM [Sheet1$] WHERE [column1] IS NOT NULL). Or use the conditional split task to filter the required rows from the excel source.

NiharikaMoola-MT
  • 4,700
  • 1
  • 3
  • 15
  • The last row where the data is there is spitted into four new lines and when i apply the filter it only removes the first line and still 3 new lines remain. Any idea how to fix this? – TheSacredKiller Jun 09 '22 at 13:42
  • Can you provide the example snip with 4 new lines by editing the question? – NiharikaMoola-MT Jun 09 '22 at 15:16
  • Added the screenshot . The text is actually showing in 5 different lines when i copy that in text editor.All that long text is just wrapped. – TheSacredKiller Jun 09 '22 at 17:14
  • I have tried from my end by adding multiple lines in a single column. In my dataset preview, it shows as a single line. And tried filtering the first word as shown in my answer, and it is working fine for me. Can you check if there are any hidden rows in the files? – NiharikaMoola-MT Jun 10 '22 at 11:36
  • Also, you can try removing the line breaks by replacing **\n** with nothing in the first column using the `derived column` expression. – NiharikaMoola-MT Jun 10 '22 at 11:39
  • Hi. I will check for hidden lines as well as with derived column and will let you know. Thanks again – TheSacredKiller Jun 10 '22 at 16:24