SSIS - Merge Join Issue when Inputting to Multiple Destinations

Question

One Destination - All Merge Join Rows

Two Destinations - Fewer Merge Join Rows

Can anyone please explain this behavior for me?

I am generating a count field and then feeding that to back into the main stream with the merge join and then performing a conditional split based on the count. It works fine without the update statement but I get different results when I run it with an update statement from the conditional split. Maybe also worth mentioning that there are no nulls in the data both pictures are the same file as well. Any thoughts are appreciated. Thanks.

Are we supposed to be looking at row counts? In one of the pictures the package is finished in the other it is still running. — Martin Smith, Apr 17 '17 at 21:14
Yes, I am nore sure why the package would process 2188 fewer rows.In the second image the package gets stuck like that.. presumably running but I don't understand why it would require so much more processing to update 8000 rows. — Topher McData, Apr 17 '17 at 21:25
You're running at least 8977 individual update transactions through the OLEDB command at the end, that will take time. It is usually quicker to insert to a staging table then do a set based update. — Martin Smith, Apr 18 '17 at 04:44

score 0 · Accepted Answer · answered Apr 17 '17 at 21:43

When OLEDB command didn't finish executing the current batch of rows, it's precedent component (condotionnal split) will not send more rows until he finish processing and so on. Also it is depending on the dataFlow DefaultBufferSize and DefaultBufferMaxRows

Read more on Data Flow Performance Features

score 0 · Answer 2 · answered Apr 23 '17 at 02:48

I figured I'd update what I learned. It appears that the issue with how many rows were loaded (49430 versus 52220) was due to the DefaultMaxBuffer size and DefaultMaxBuffer Rows settings in SSIS. This did not improve performance, just how many records were loaded into memory.

As Martin suggested above, the delay with processing the update was due to inefficiency. For anyone wanted to know what a Staging Table is.. it's just a generic term for a table you make in your database (Or make the table in SSIS with a sql command task) then use sql command in SSIS to run an update statement. You can drop the staging table in a SSIS task after the update if you want. I can not understate how much of a performance increase this gives you for large updates.

SSIS - Merge Join Issue when Inputting to Multiple Destinations

2 Answers2