0

I need to develop a SSIS Package where I will need to import/use a flat file(has only 1 column) to compare each row against existing table's 2 columns(start and end column).

Flat File data -

110000
111112
111113
112222
112525
113222
113434
113453
114343
114545

And compare each row of the flat file against structure/data -

id  start   end
8   110000  119099
8   119200  119999
3   200000  209999
3   200000  209999
2   300000  300049
2   770000  779999
2   870000  879999

Now, If need to implement this in a simple stored procedure that would farely simple, however I am not able to get my head around this if I have do it in SSIS package.

Any ideas? Any help much appreciated.

  • 1
    Are you trying to do some sort of lookup where the Flat File data value is between some start/end in your data? What are your plans if it doesn't match any of those values? Based on your sample data, are there ever instances when your values could overlap and produce different IDs for the same value? – Peter Schott Jul 05 '13 at 19:03
  • Yes, I need to look up between start and end. If it is found in that range then I need to split that range where I can add another row for that number and extend that range. so the range will become id start end 8 110000 111112 9 111113 119099 – Aric Smith Jul 07 '13 at 20:25

1 Answers1

0

At the core, you will need to use a Lookup Component. Write a query, SELECT T.id, T.start, T.end FROM dbo.MyTable AS T and use that as your source. Map the input column to the start column and select the id so that it will be added to the data flow.

If you hit run, it will perform an exact lookup and only find values of 110000 and 119200. To convert it to a range query, you will need to go into the Advanced tab. There should be 3 things you can check: amount of memory, rows and customize the query. When you check the last one, you should get a query like

SELECT * FROM 
(SELECT T.id, T.start, T.end FROM dbo.MyTable AS T`) AS ref 
WHERE ref.start = ?

You will need to modify that to become

SELECT * FROM 
(SELECT T.id, T.start, T.end FROM dbo.MyTable AS T`) AS ref 
WHERE ? BETWEEN ref.start AND ref.end

It's been my experience that the range queries can become rather inefficient as it seems to cache what's been seen already so if the source file had 110001, 110002, 110003 you would see 3 unique queries sent to the database. For small data sets, that may not be so bad but it led to some ugly load times for my DW.

An alternative to this is to explode the ranges. For me, I had a source system that only kept date ranges and I needed to know by day what certain counts were. The range lookups were not performing well so I crafted a query to convert the single row with a range of 2010-01-01 to 2013-07-07 to many rows, each with a single date 2013-01-01, 2013-01-02... While this approach lead to a longer pre-execute phase (it took a few minutes as the query had to generate ~30k rows per day for the past 5 years), once cached locally it was a simple seek to find a given transaction by day.

Preferably, I'd create a numbers table, fill it to the max of int and be done with it but you might get by with just using an inline table valued function to generate numbers. Your query would then look something like

SELECT
    T.id
,   GN.number 
FROM 
    dbo.MyTable AS T
    INNER JOIN
        -- Make this big enough to satisfy your theoretical ranges
        dbo.GenerateNumbers(1000000) AS GN
        ON GN.number BETWEEN T.start and T.end;

That would get used in a "straight" lookup without the need for any of the advanced features. The lookup is going to get very memory hungry though so make the query as tight as possible. For example, cast the GN.number from a bigint to an int in the source query if you know your values will fit in an int.

billinkc
  • 59,250
  • 9
  • 102
  • 159
  • Thank you for your feedback. But I changed the approach and used CTE with levels to build to ranges and using that to into another CTE pull ranges that are not in the existing list. it was not that complex but syntax that is thing that I struggled with. – Aric Smith Jul 17 '13 at 16:46