-1

Can someone please share a best practice for the situation I have encountered?

I mapped about 20 fixed width text files using SSIS to corresponding staging tables. The data flows as intended into these tables. These fixed width text files are large (e.g.: 1500 bytes) with different sub-record types--and thousands of records. The next step is for me to transform the data from staging into their destination tables.

There are many fields in these fixed width text files that have abbreviated codes that correspond to statuses, products, etc. (e.g.: A - apple, B - Banana, C - Cherry, etc.). There are reference tables and appendices in the fixed width text file mapping guide to decode the abbreviations. Sometimes there may be only three or four abbreviations for a field outlined in the guide. Sometimes a field may have an appendix outlining over a hundred possible choices and abbreviations.

In short, do I also need to create dimension tables for EVERY field listed in staging to decode these abbreviations? Or should I hardcode some of these fields in SQL when the options are limited?

CASE WHEN 'A' THEN 'Apple' WHEN 'B' THEN 'Banana' WHEN 'C' THEN 'Cherry' END AS fruit

My instinct is to be consistent and create a dimension table for all of these items but that may mean creating several hundred tables that would likely only be used in the staging process.

What works best here? Any help is appreciated. Thank you.

  • I can't envision enough of your problem to provide a solution, but unless you have rigorous control over your source data, I would advise a lookup table (and a lookup component in your data flow or update statements after the fact). Inevitably, you'll run into code D and now you have to either - modify code to handle Durian or add a row to a table. When production is failing at 3 a.m. because of a missed case, my experience has been that adding a row to a table is a non-entity whereas deploying new code raises peoples hackles. – billinkc Jun 08 '21 at 02:03
  • Thank you, @billinkc. Your comment is helpful. Indeed, at some point, code D will appear and need to be added to the transformation process. Better to keep the SQL code as-is at all times and change the dimension / lookup tables where applicable. – user3525534 Jun 08 '21 at 02:12
  • Also, the fixed width text files are coming from a vendor. And while the data is thoroughly documented and any changes to file format are announced in advanced, I do not have control over the source data. – user3525534 Jun 08 '21 at 02:14

1 Answers1

1

If all you have in your dimension is Code/Value then I would consider a single lookup table.

create table dimLookup(
lookUpKey int identity primary key
,lookUpType varchar(255) -- this names what you are looking up. can be reused or specific
,lookupCode varchar(100) -- this is what is being mapped
,lookupValue varchar(255) -- this is the underlying value
)

Notes:

  1. make sure you put an index on lookupType.
  2. Also the combination of lookupType and LookupCode should be unique.
  3. LookupType can be synonymous with a table name.
KeithL
  • 5,348
  • 3
  • 19
  • 25
  • Thank you! This is exactly what I did and it worked well. Luckily, the field names were unique so it's was easy to join the FACT tables to the one DIM table on the lookupCode and lookupType columns. – user3525534 Jun 18 '21 at 13:13