1

Folks,

Quick Version: How should I model HUBs, SATs and LINKs when I have multiple domain lookup references in my HUB_SAT? If you are were to generically model these from the source schema, how would you differentiate between FKs that should be LINKs and FKs that should be References?

Long Version:

I am building out a generic solution for generating DV models from an existing 3NF MSSQL schema. In my source database I have one huge Domain reference table which holds the majority of the business lookup keys

  • Key INT (Unique)
  • TypeID INT
  • Description VARCHAR
  • Posting Code
  • ... some other fields that are not relevant to the discussion

As I see it there are four basic choices for linking to this table

  1. Create it as a HUB and then produce LINK tables for each business HUB that refers to it
  2. Create it as a single ReferenceLookup table and include the R_ReferenceIDs in the SAT table
  3. Create a separate ReferenceLookup table for each TypeID and link from the SAT using the R_ReferenceID
  4. Create Separate HUBs for each TypeID and generate LINK tables
  5. Create a single LINK table with a LINK_SAT table to hold details of which reference value is mapped by the LINK

and of these #3 feels like the best design (but also the hardest to model correctly - especially as in my case the lookup table has a FK to the Type table)

From the Wikipedia for DataVault, Reference tables are referenced from Satellites, but never bound with physical foreign keys.

My generic code is based on the design pattern as explained in the BIML DataVault walkthrough

I am looking at all tables in the source schema to determine whether they are a HUB (Have PK and multiple FKs plus fields that are not FKs), SAT (Have PK and only one FK) or LINK (Have PK, more than one FK and all fields are in PK/FK)

I then build:

  • HUBs with the a HUB_ID and the source PK
  • SATs with the non FK fields of the HUB source table
  • SATs for the source SAT tables
  • LINKs from the source LINK tables
  • LINKs from the HUB FK relationships

This is all working up to a point (i.e. I have tables for all of the above) however there are some pretty wide tables where a significant number of the fields are simply R_RefID fields all looking up on the same HUB and they are all bound with FKs on the entity table referencing the reference table

E.G. source Asset table has reference fields for - Asset Type - Asset Purpose - Asset Manager - Asset Funder - ...

so in the preliminary model I have:

  • ASSET_HUB (HubID, Asset_ID)
  • ASSET_SAT (SAT_ID, BuildDate, DisposalDate, ....)
  • Lookup_HUB (Hub_ID, LookupID)
  • ASSET_Lookup_1_LINK) (Link_ID,ASSET_HUB_ID,Lookup_HUB_ID)
  • ASSET_Lookup_2_LINK) (Link_ID,ASSET_HUB_ID,Lookup_HUB_ID)
  • ASSET_Lookup_3_LINK) (Link_ID,ASSET_HUB_ID,Lookup_HUB_ID)
  • ASSET_Lookup_4_LINK) (Link_ID,ASSET_HUB_ID,Lookup_HUB_ID)

but there is no way of identifying what each of the LINK tables respresents in the domain model

How would you go about interrogating the schema to determine whether the table is a genuine HUB candidate or whether it should be a REF table instead and how would you determine whether an FK should be treated as a LINK or a SAT.R_RefID. I am after strategy rather than code (but I ;m not going to turn down code if it is on offer :) ) My source DB is SQL2008R2 and my development environment i SQL2016_Dev

In Response to tobi6:

In the source system The business entity has a number of attribute fields which are just XXX_ID types that look up their descriptors from the domain reference table. If you model this domain reference table as a HUB then you either have to have separate link tables for each lookup (LINK tables are automatically generated because there is an FK on the business entity), or multiple active LINK records with a LINK_SAT to identify which attribute you are tracking (actually this creates a 5th design pattern option). If I tag the domain reference table as a REFerence then the XXX_IDs stay in the HUB_SAT which feels like a better solution but is harder to model generically. I.e. how do I determine whether the business entity FK should create a LINK, LINK and LINK_SAT or SAT.R_RefID

Aaron Reese
  • 544
  • 6
  • 18
  • Maybe [this answer](https://stackoverflow.com/questions/39985216/data-vault-model-what-are-hubs-good-for/40002696#40002696) will clear things up a bit. Also, I have trouble understanding how those Asset Lookup Links get so many. I think hub candidates are entities which are being used in the company (and they might have a good business key). Do you hash the keys, DV2.0? – tobi6 Jun 28 '17 at 10:15
  • Thanks for the contribution tobi. That question does not provide me with a solution though :(. In answer to your question I have updated the original post to provide more context – Aaron Reese Jun 28 '17 at 12:02

0 Answers0