0

A tale of two cities almost...I have 17,000 rows of data that come in as a pair of strings in 2 columns. There are always 5 item numbers and 5 Item Unit counts per row (unit counts are always 4 characters). They have to match up unit and item or it's invalid. What I'm trying to do is "unpivot" the strings into individual rows - Item Number and Item Units

So here's an example of one row of data and the two columns

  • Record ID Column: 0
  • Item Number Column: A001E10 A002E9 A003R20 A001B7 XA917D3
  • Item Units Column: 001800110002000300293

I wrote a C# windows app test harness to unpivot the data into individual rows and it works fine and dandy. So it basically unpivots the data into 85,000 (5 times 17,000) rows and displays it to me in a grid which is what I expect (ID, Item Number and Item Units).

  • 0 | A001E10 | 0018
  • 0 | A002E9 | 0011
  • and so on...

In my SSIS app I added a script task to process this same data and basically used the same code that my test harness uses. When I run my task I can see it loads the 17,000 rows but it only generates 15,000 +/- on the output so obviously something isn't right.

What I'm thinking is that I don't have the script task setup correctly even though it is using the same code that my test harness uses in that it's dropping records for some reason.

If I go back into my task and give it a particular record ID that it didn't get in the first pass, it will process that ID and generate the right output. So this tells me that the record is ok but for some reason it misses it or drops it on the initial process. Maybe something to do with buffers?

  • Where do you see 15,00 records? in the SSIS interactive window in BIDS? What's the actual record count in the table that it ends up in? – Nick.Mc Sep 13 '15 at 23:19
  • Sorry...forgot to say where...yes in BIDS after it runs through the script task. Then when I query the destination table it has the same count as the script task result – Tate Antrim Sep 14 '15 at 00:08
  • Are you using the script task inside a data flow as a transformation or on the control flow as a regular script task? – Vinnie Sep 14 '15 at 00:52
  • As a transformation. I just ran an extract batch with the whole package. So this particular batch has 17,425 rows. When it ran with the whole package it generated 14,250 rows. My test harness says it should have generated 77264 rows. – Tate Antrim Sep 14 '15 at 00:55
  • So I ran it by itself and it generated 77264 rows. So that's the weird part. If it runs as part of the whole datawarehouse extract it doesn't generate all the rows. Run it alone and it generates all the right rows. That's interesting...not sure why that would be. It is with a bunch of other disconnected packages in the same sequence container all running at once...hmmm...maybe pulling it out and running it separate? – Tate Antrim Sep 14 '15 at 00:57
  • Sorry, not packages, other dataflow tasks. – Tate Antrim Sep 14 '15 at 00:59
  • As the question stands, I think it's unsolveable. There are too many vaguely defined pieces and interactions. My shot in the dark is that your script task is an asynchronous process: meaning 0 to N rows in yields 0 to M rows out. The code you have in there to match row n to n+1 is broken - possibly is assumes an ordering that doesn't exist or a process elsewhere is breaking your data. – billinkc Sep 14 '15 at 01:21
  • The way to improve this question is to provide concrete, reproducible scenario that someone on SO will be able to perform and therefore diagnose. – billinkc Sep 14 '15 at 01:22
  • Surely your test harness is appreciably different, i.e. it's not using any of the SSIS object model?... as always I suggest doing the processing in the database, not inside SSIS. I am the SSIS troll. – Nick.Mc Sep 14 '15 at 05:16
  • The test harness is a windows app that I build to test my parsing and unpivot logic. But all of the code that parses and unpivots the strings is directly copied over to the ssis application - there is no difference between the code used in the test harness and the code used in the ssis script task. The task is not running asynchronously - if I try that the ssis application hangs. – Tate Antrim Sep 14 '15 at 11:37
  • Sorry - hit enter - the script task takes the rows coming in and creates all new rows going out based. – Tate Antrim Sep 14 '15 at 11:38

1 Answers1

1

Well - I figured it out.

We have a sequence task with tons of dataflow tasks inside it that are running in parallel. We're relying on the engine to prioritize and handle the data extract and load correctly. However, this one particular script task is not handled by the engine correctly within that sequence container.

The clue was that you could run the script task itself outside of the whole process and it worked fine. So we pulled the script task out of the sequence task and put it by itself after the sequence task and now it runs correctly.