1

I use a SSIS component to retrieve data from an API. The API delivers XML, This XML contains data like this:

<tag>
code
</tag>

Linefeeds before and after 'code'

That means that what is entered into my (n)varchar columns in SQL looks like this: CHAR(10)codeCHAR(10)

That messes up a lot, among other things the way things look in the reports. So the CHAR(10) needs to disappear. It needs to be filtered out between the source component and the destination component in my dataflow. I could use derived columns for this but it concerns 9 dataflows with (in total) 385 (n)varchar fields. A lot of work! Is it possible to use a script component that simply does a replace(field,linefeed,'') on each (n)varchar? So that all data passes through this component and exits stripped of linefeeds?

If so, how do I do this? I am pretty ok with SQL, can read most languages but need some help on writing this in a C# of VB.NET. I am designing this in VS2019 and deploying to ADFv2 IS. Targets are SQL Azure databases.

I cannot simply run some SQL after the SSIS job has run since the next run will do a upsert on the data. It will then conclude that CHAR(10)codeCHAR(10) is different from code and insert a new line, violating the unique constraint on code. So that's why I need to do it after getting the data and before writing the data.

Henrov
  • 1,610
  • 1
  • 24
  • 52

2 Answers2

1

You don't necessarily need SSIS to do this. You can use the TRIM function in SQL Server 2017 and later and Azure SQL DB. Insert your XML into a staging table unchanged from the original. Then call a stored procedure that cleans up the XML and optionally inserts it into another table. An example:

--INSERT INTO ...
SELECT 
    TRIM( CHAR(10) + CHAR(13) FROM someXML.value( '(tag/text())[1]', 'VARCHAR(20)' ) ) trimmedTag
FROM xmlTest

TRIM gives you the option to add characters you want to remove. In the above example I've used the CHAR function to remove line feed (10) and carriage return (13) respectively.

From within SSIS you could use an Execute SQL Task to call the proc.

wBob
  • 13,710
  • 3
  • 20
  • 37
  • Unfortunately that won't do the trick. First of all since the next run does a upsert on the data found in the previous run and code is part of the key . I need it to run in the dataflow. SQL Execute task is not available there. Second is that it takes SQL SERVER 2,5 hours to do the search and replace on all 300+ fields. That is for one run, containing about a half a million records. Actual data will run into millions. So that just takes too long.. – Henrov Apr 27 '20 at 06:58
0

I could not find a solution that would not require extensive coding, at the cost of performance. One solution I tried was writing the XML as a CSV, do a search and replace on the CSV en then import that into a table. Though it worked, it made the solution unnessecary complex and it was detrimental in terms of performance. It was fighting symptoms, not curing the issue. So I went and talked to the guys delivering the API and they removed these linefeeds. That turned out to be the only good solution.

Henrov
  • 1,610
  • 1
  • 24
  • 52