0

I've been doing some searching on Stack Overflow as well as Google and haven't quite found the answer to my question, so here we go:

It's been a minute since I've done a 'from the ground up' data warehouse project, so I'm dusting off some of my past knowledge, but am blanking on a solution to one of my data load scenarios.

I am creating a Fact Table (factOrderLines) with of course many dimensions joined to it. One of the dimensions I would like to link to factOrderLines is the dimItem. The problem is an Item is unique based on either the item's vendor and vendor part number, manufacturer and manufacturer part number, or an identifier from a subset of items called ManagedItems (MngItemID).

source ex:

Vendor   VendorPartNo  Manufacturer   ManufacturerPartNo MngItemID 
100      3456          NULL           NULL               67
100      3254          03             1234               23
NULL     NULL          03             1235               24
NULL     NULL          15             5120               NULL

Problem is when I do my join to the dimItem table from my source table to populate the factOrderLines table I have three lookup scenarios. This is causing the numbers to inflate and performance to be horrible.

LEFT OUTER JOIN dimItem AS i ON  
    (i.Vendor = src.Vendor AND i.VendorPartNo = src.VndrItemID) OR  
    (i.Manufacturer = src.Manufacturer AND
    (i.ManufacturerPartNo = src.MfgItemID) OR (i.MngItemID = src.MngItemID)

Is there a more efficient/better approach to this scenario than what I have started to implement?

edit: Full INSERT query (for better understanding)

INSERT INTO fctOrderLine
           (PurchaseOrderKey
           ,DateKey
           ,PurchaseOrderLineNo
           ,VendorKey
           ,ManufacturerKey
           ,ItemKey
           ,UnitPrice
           ,Qty
           ,UnitOfMeasure
           ,LineTotal)
SELECT      PurchaseOrderKey    =   po.PurchaseOrderKey
           ,DateKey             =   ISNULL(c.DateKey, 19000101)
           ,PurchaseOrderLineNo =   ISNULL(p.POLineNbr, -1)
           ,VendorKey           =   ISNULL(v.VendorKey, -1)
           ,ManufacturerKey     =   ISNULL(m.ManufacturerKey, -1)
           ,ItemKey             =   ISNULL(i.ItemKey, -1)
           ,UnitPrice           =   ISNULL(p.UnitPrice, -1.00)
           ,Qty                 =   ISNULL(p.POQty, -1.00)
           ,UnitOfMeasure       =   ISNULL(p.ANSI_UOM, N'UNKNOWN')
           ,LineTotal           =   ISNULL(p.LineTotalCost, -1)
FROM        stgOrders AS p
INNER JOIN      dimPurchaseOrder AS po ON po.OrderNo = p.PONumber   
LEFT OUTER JOIN dimCalendar AS c ON c.Date = (CASE WHEN p.DT_PO IS NULL OR ISDATE(REPLACE(p.DT_PO, '''', '')) = 0 THEN CAST('19000101' AS DATETIME) ELSE REPLACE(p.DT_PO, '''', '') END)
LEFT OUTER JOIN dimVendor AS v ON v.VendorID = p.VendorID
LEFT OUTER JOIN dimManufacturer AS m ON m.ManufacturerID = p.MfgID
LEFT OUTER JOIN dimItem AS i ON (i.VendorKey = v.VendorKey AND i.VendorPartNo = p.VndrItemID) OR (i.ManufacturerKey = m.ManufacturerKey AND i.ManufacturerPartNo = p.MfgItemID) OR (i.MngItemID = p.MngItemID)
jmed1504
  • 1
  • 2
  • Each dimension in a star schema needs to have its own surrogate key (meaningless autoinc int). – Sev09 Mar 04 '15 at 16:19
  • 1
    Why don't you join on dimItem.ItemKey? – Tab Alleman Mar 04 '15 at 16:38
  • @TabAlleman sorry about that, i just edited the question, I did not mean to include itemKey column as it is not in the source. – jmed1504 Mar 04 '15 at 16:52
  • @user2556563, given this new information, by initial comment still stands. Star schema methodology dictates that each fact and dimension needs to have a surrogate key, or meaningless auto increment integer. This is what you'll join on in the Fact table logic. – Sev09 Mar 04 '15 at 17:03
  • @user2556563 thanks for your suggestion! I understand that but if the source database table wont have the meaningless or surrogate key. so when using that as a source and doing your lookups on the dimensions to fill the keys in for the fact table, thats the part my question involves... IE: SELECT dimA.SurrogateKey FROM srcOrder LEFT OUTER JOIN dimA ON dimA.businessKey = srcOrder.BusinessKey... is that not the general approach? i could be wrong :) – jmed1504 Mar 04 '15 at 18:16
  • @user2556563, I'm not sure I understand the question, but if I do, then yes - that's correct. The Fact Table is made up of all Dimension surrogate keys and the "facts," which are the calculated values (totals, avgs, etc.) – Sev09 Mar 04 '15 at 18:20
  • Are you saying that you are going to Update the Fact table with the surrogate key from the Dimension table, and are concerned about the performance of that one-time operation? why? – Tab Alleman Mar 04 '15 at 18:20
  • I'm saying that the fact table sums are being inflated and performance takes a hit when run my insert to the Fact Table, i have 4 other dimensions joined to the source table to fill the Fact table with keys, they run in seconds and of those table sums they are pretty much spot on, but when i try to fill in the facts from my last dimension (Item) the numbers blow up and the performance now changes to well over an hour – jmed1504 Mar 04 '15 at 18:25
  • @jmed1504, so does your Item Dimension contain a surrogate key? And what about your other dimensions? Thanks. – Sev09 Mar 04 '15 at 18:32
  • @Sev09 thanks for your responses! I created an edit with a watered down INSERT statement that I'm trying to use to populate the fact table. All the Joins are the same, i just weeded out some of the extract facts I was inserting...Sorry my question is confusing, I'm trying to think of the best way to phrase it and what info i can provide that would not be info overload lol – jmed1504 Mar 04 '15 at 18:40
  • Ah! Just saw your edit. Let me digest for one moment. – Sev09 Mar 04 '15 at 18:43
  • So it looks like the Item Dimension is the only dimension that doesn't have its own Surrogate Key. I still maintain that you need to create an "Item ID" in the Item Dimension to link on. The complex join for Item is where your query is struggling and duplicating. – Sev09 Mar 04 '15 at 18:50
  • @Sev09 I have all the business keys and the surrogate key in the dimItem right now, but it is not in the source table (stgOrder) that I am joining to to get data such as UnitPrice and LineTotal to populate the fctOrderLine table. dimItem fields:[ItemKey,MngItemID,VendorPartNo,ManufacturerPartNo,ItemDescription,VendorKey,ManufacturerKey ] – jmed1504 Mar 04 '15 at 18:56
  • I think I understand more now... So are there any potential scenarios where an item could have the same vendor and manufacturer? – Sev09 Mar 04 '15 at 19:01
  • Yup.. pretty much any permutation of the combination exists pretty much lol except there has to be either a Vendor Item, Manufacture or ManageItemId for sure, they are never all NULL. but any other combination of those three existing is possible, and that's what I think is blowing up the numbers – jmed1504 Mar 04 '15 at 19:07

0 Answers0