2

I have a general ledger transactions down to the individual journal bookings. I am working on a process to rebuild these booking from the bottoms up using tables and other information that is located in other places. To start, I have an SSIS package that pulls in 3-4 different "Divisions" worth of data.

In one case, there are over 600k lines, and I'll need at most 50k. The 600k being loaded into a table takes a while. I was looking for a way to head that off. If I were doing it in SQL Server, I'd do something like:

SELECT * FROM C601
WHERE (COST_CENTER = 5U AND ACCOUNT = 1100001)
OR (COST_CENTER = 5U AND ACCOUNT = 1300001)

I'd have about 12-13 total WHERE items, but would reduce it to maybe 10% of the original items. Is there a way to filter the flat file loading in SSIS with far fewer items before I load the SQL Server table as I would with SQL above?

Hadi
  • 36,233
  • 13
  • 65
  • 124
Vaslo
  • 480
  • 1
  • 5
  • 22
  • 6
    You can use a [Conditional Split Transformation](https://learn.microsoft.com/en-us/sql/integration-services/data-flow/transformations/conditional-split-transformation?view=sql-server-2017) to filter the rows, yes. The whole file will still need to read (you can't stop that) but you can use the split to only direct the rows you want to the destination source. – Thom A Aug 12 '19 at 18:17
  • 1
    Is this a one of job or a periodic process? In the first instance, you could use CSV/SQL filter functions in Notepad++ available via plugin. – wp78de Aug 12 '19 at 18:24
  • Larnu - This was the only solution I was able to find online and was the way I was going to go, but was hoping there was an easier way. Where do you push all the transactions you don't care about? I'd have no interest in storing them as they would just clutter up the DB with 5 million more transactions that don't affect my business. – Vaslo Aug 12 '19 at 18:35
  • wp78de - This is every month, and during the close procedure I may update 1-2x per day, so I'm trying to make it repeatable. The exceptions I need out of "C601" are very specific, so SQL would be helpful there at least as far as I know the language. If it were one-off to your point I wouldn't waste much time on it. – Vaslo Aug 12 '19 at 18:38
  • 1
    @Vaslo... you do not need to direct the transactions that you don't want anywhere. Essentially they can die in the conditional split. – KeithL Aug 12 '19 at 19:47
  • 2
    Provided you do something with one of the flows that comes out of the conditional split, SSIS is happy; the sets you don't use will effectively be discarded. Condition Split is often used to Filter the results, probably more so than to actually "split" the data to multiple destinations. – Thom A Aug 12 '19 at 19:52

1 Answers1

2

Use a Conditional Split Transformation

1st approach

Add a similar expression:

[COST_CENTER] = "5U" && ([ACCOUNT] = 1100001 || [ACCOUNT] = 1300001)

2nd approach

Or you can add two split expression as following:

COST_CENTER]!= "5U" || [ACCOUNT]!= 1100001

And

[COST_CENTER] != "5U" || [ACCOUNT] != 1300001

Then you can use the Conditional Split default output to get the desired result.

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • This solution along with what was provided above from everyone has worked. I didn't realize how much faster the process would be at the Conditional Split and doing it in this method greatly sped up the process. – Vaslo Aug 13 '19 at 21:13