1

I have a Person Satellite with a Gender attribute. From source systems the values for this attribute can be: F, M, FEMALE, or MALE. Which of the two following approaches is the correct one for Data Vault modeling?

  1. Store data in Gender as it comes from sources and in the Business Vault or Data Marts standardize the values to FEMALE and MALE only

  2. Create a cross-reference table to map out F to FEMALE and M to MALE, while loading the Person Satellite, transform F to FEMALE and M to MALE using the cross-reference table.

I'm using Amazon Redshift that supports column compression.

Cesar Vinas
  • 343
  • 1
  • 10
  • 20
  • I can't really answer your question but I say that Redshift and DV sounds like a painful combination. In my experience Redshift performance degrades badly as the number of joins increases. – Joe Harris Feb 23 '17 at 15:07
  • Thanks, @Joe Harris. I understand about the multiple joins. I will have some PIT's and bridges in the Business Vault to help with that as well as create a number of specific Information Marts for end users that will present a more unified view of the data – Cesar Vinas Feb 27 '17 at 14:23

2 Answers2

2

I emailed Daniel Linstedt, creator of the Data Vault modeling method, to ask him the same question. His answer:

"I typically store it as it comes in, THEN translate it on the way to the Business DV.  This way, if the business ever changes it's mind, we can re-write the translation rule without affecting history.  But more than that, I've seen source systems that deliver values outside the boundaries of what's acceptable.  Do not try to translate on the way in to the Raw DV, to do so would destroy auditability."

Cesar Vinas
  • 343
  • 1
  • 10
  • 20
0

Data vault concept is useful when you have a very complex business logic that changes over time but F/Female and M/Male mapping is a pretty simple and stable logic. Having a cross-reference will be just overcomplicating things here. I would just standardize the values to F/M and use char(1) column without compression here.

AlexYes
  • 4,088
  • 2
  • 15
  • 23
  • Thanks, @AlexYes. I'm also of the idea of standardizing these values. However, I'm more interested to know about the cons of doing this as the Data Vault principles clearly state that no transformation should be done in the Raw Vault – Cesar Vinas Feb 27 '17 at 14:25