2

I was just reading about Data Vault modeling and as far as I understand it, the hub does only contain keys (and the record source). So I was wondering why I should create those hub tables, only to store the record source? Wouldn't it be enough to have only Satellites and Links?

Btw: I'm looking for simple mysql tables in a data vault form to download and play with.

2 Answers2

3

One of the main concepts of Data Vault modeling is the separation of Business Keys, Satellites for detail data and Links to connect Hubs.

Example

Employee
--------
Personnel Number
Name
Surname
Street
City

Department
--------
ID
Shortcode
Name
Employee Number

Imagine that one department only has one employee.

Business Keys

Now the business identifiers for the business objects Employee and Department need to be identified. This would be Personnel Number for the Employee and Shortcode for Department.

Why not ID for Department? Well, the ID is most probably a database internal ID. The shortcode is in this example something like DEP_A1613, which is also internally used to identify the department.

Modelling

The hub for Employee consists only of the field Personnel Number and the hub for Department only of Shortcode.

This means the Hub in Data Vault modelling is for storing the business key only. Of course, Data Vault fields like Record Source, Load Date and other are needed as well. Both Hubs would also have corresponding Satellites for the describing data. It would be a violation of the Data Vault modeling technique to link Satellites together without Hubs. It wouldn't make sense, either: you need some kind of common identifier for your Satellite data which wouldn't be there if you would omit the Hub.

Conclusion

So to answer your question: You should model Hubs for business keys. Absolutely. Hubs are in fact an essential element of Data Vault modeling. Links are only connected to Hubs, not to Satellites.

Imagine a change in the Employee software. All other fields are now stored in the Employee satellite. When using a new source Employee software you could store all data in a new satellite whilst using the same Hub and business key.

Just to complete this example: the link would connect Employee and Department from Department with Employee Number.

EDIT

So for example the structure would look like this. Data Vault specific fields are marked with [DV]:

Hub Employee
------------
Employee Hash Key [DV]
Load Date [DV]
Record Source [DV]
Personnel Number

Sat Employee
------------
Employee Hash Key [DV]
Load Date [DV]
Load End Date [DV]
Record Source [DV]
Hash Diff [DV]
Name
Surname
Street
City

Link Employee Department
------------------------
Employee Department Hash Key [DV]
Employee Hash Key [DV]
Department Hash Key [DV]

Hub Department
--------------
Department Hash Key [DV]
Load Date [DV]
Record Source [DV]
Shortcode

Sat Department
--------------
Department Hash Key [DV]
Load Date [DV]
Load End Date [DV]
Record Source [DV]
Hash Diff [DV]
ID
Name
tobi6
  • 8,033
  • 6
  • 26
  • 41
  • Thank you very much for your detailed answer. But I must admit I still don't get the point. What is the use of a hub table containing only the dept. shortcode? Since the shortcode must also be in the satellite, I see no reason for reading the hub table!? –  Oct 13 '16 at 08:20
  • This isn't true: the shortcode won't reside in the satellite. No business key or link key resides in the satellite. The satellite will be connected to the hub with the business key's hash key. – tobi6 Oct 13 '16 at 08:22
  • Ok so the hub separates the department shortcode from the data in the satellite. What would be the disadvantage of the shortcode being the satellite? –  Oct 13 '16 at 19:45
  • 1st this would not be part of the modelling technique. You'll want to avoid being able to join Satellites to Satellites, which could enable Power Users to do so. 2nd the hub separates the *business key* from the describing satellite data. 3rd the hub is the place to join via links (one of driving factors of DV) to embed new sources. This wouldn't work with the business key in satellite. Think about parallelization, wouldn't be possible without the hash keys. – tobi6 Oct 14 '16 at 07:47
  • @tobi6 Quick additional question: Why is the record source in the hub? Suppose a record comes from multiple sources (e.g. in the case of a customer, suppose there is a CRM source, a sales source, and a warranty source)? In most cases, wouldn't you have 3 satellites, each tied to one source. If so, shouldn't the source information be in the satellites?) Thanks! – Ben Nov 03 '16 at 17:28
  • The Record Source should reside in both Hub and Satellite, especially when having multiple Satellites. You want to be able to audit your way back to *which* system provided the business keys. This could also be true if the Hub would be filled from several systems (if the business key is the same). If it differs, it would be done via Same-As-Links. – tobi6 Nov 03 '16 at 20:19
2

The hub is where the passive integration of multiple sources is applied. You would have a column for data source and record all instances of each key as it first arrives in your hub. Example, if I have a CRM system and an ERP system and I sync the data from the CRM system first, then the ERP data comes available. I would add all of the keys from the CRM system, with the data source column value of "CRM". Then when I bring in the ERP system, assuming I have the same structure of keys for the table, I would only add the new keys that only exist in the ERP system with a data-source of "ERP". If the keys are different, you would have to add all of the data from both systems. The point is that you are retaining all of the data from all of the systems in play. When you move to your next layer, be it a Business Data Vault or a Data Mart, you would apply business logic against the hub and satellites according to the "Business Rules" to get to a single resulting row for the two systems where applicable. If you employ the transformation before storing it in this intermediate state, you lose the audit-ability, and the ability to change the business rules at a later date. Make sense?

Michael buller
  • 566
  • 4
  • 11
  • 1
    When having different systems sourcing the same Hub, it is very important that the business key looks alike. If this is not the case, Same-As-Links should be used. This would also take the logic out of the Business Vault layer (business rules) and provide the wanted single entry for the Hub. I would not apply the logic in the next layer. – tobi6 Nov 07 '16 at 09:53