17

I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough.

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
Lieven Cardoen
  • 25,140
  • 52
  • 153
  • 244

8 Answers8

14

Using FK-constraints in a DW is like wearing a bicycle helmet. If the ETL is designed correctly, you technically don't need them. That said, if I had a million dollars for every time I've seen bug-free ETL, I'd have zero dollars.

Until you're at a point where FK-constraints are causing performance issues, I say leave'em. Cleaning up referential integrity problems can be much harder than adding them from the get-go ;-)

Bill Anton
  • 2,920
  • 17
  • 23
  • 1
    My 20+ years of experience in data and data warehousing agree with you... Projects change/evolve and customers (and developers!) can easily introduce changes that break assumptions. Having FKs is indeed a great safety net--"bicycle helment" rocks as a simile! Failing that, I'd encourage a final "validate" phase of the load process that at least checks the constraints/uniqueness in the data. Great answer, Bill. – Eric Kramer Apr 29 '16 at 12:59
11

I presume that you refer to FKs in fact tables. During DW loading, indexes and any foreign keys are dropped to speed up the loading -- the ETL process takes care of keys.

Foreign key constraint "activates" during inserts and updates (this is when it needs to check that the key value exists in the parent table) and during deletes of primary keys in parent tables. It does not play part during reads. Deleting records in a DW is (should) be a controlled process which scans for any existing relationships before deleting from dimension tables.

So, most DWs do not have foreign keys implemented as constraints.

Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
10

FK constraints work well in Kimball dimensional models on SQL Server.

Typically, your ETL will need to lookup into the dimension table (usually on the business key to handle slowly changing dimensions) to determine dimension surrogate IDs, and the dimension surrogate id is usually an identity, and the PK on the dimension is usually the dimension surrogate id, which is already an index (probably clustered).

Having RI at this point is not a huge of overhead with the writes, since it can also help catch ETL defects during development. Also, having the PK of the fact table being a combination of all the FKs can also help trap potential data modeling problems and double-loading.

It can actually reduce overhead on selects if you like to make general-use flattened views or table-valued functions of your star models. Because extra inner joins to dimensions are guaranteed to produce one and only one row, so the optimizer can use these constraints very effectively to eliminate the need to look up into the table. Without FK constraints, these lookups may have to be done to eliminate facts where the dimension does not exist.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
6

The quesiton is clear, but "good practice" seems the wrong question.

"Could have FK's" ?

Foreign keys are a mechanism to preserve integrity constraints during database modifications.

If your DW is read-only (accumulating data sources without writing back), there is no need for FK's.

If your DW supports writes, integrity constaints typically need to be coordinated across the participating data sources by the ETL (rather, it's Store equivalent). This process may or may not rely on FK's in the database.

So the right question would be: do you need them.

(The only other reason I can think of would be documentation of relationship - however, this can be done on paper / in a separate document, too.)

peterchen
  • 40,917
  • 20
  • 104
  • 186
  • +1. "Foreign keys are a mechanism to preserve integrity constraints during database modifications. If your DW is read-only, there is no need for FK's..." - Bull's eye! – Kent Pawar Nov 08 '13 at 14:39
  • 4
    Some databases have specific optimizations in places for star or snowflake structured data warehouses. In those cases, even on a read-only situation, the foreign keys can serve to alert the warehouse how the star is structured -- to tell it which are the fact and dimensions. Even in normalized databases foreign keys can affect the optimizer. I'm struggling to determine when and how much this matters myself right now, but it certainly does have SOME affect. – Chipmonkey Jun 25 '14 at 14:37
  • 1
    Yes, read-only means *enforcing* constraints is unnecessary--if your warehouse is a snapshot of a constraint-enforced DB. But constraints allow the DBMS to optimize queries. So "no need" is wrong. As usual it's a tradeoff. – philipxy May 28 '19 at 23:52
4

I have no idea. But nobody is answering, so I googled and found a best practises paper who seem to say the very helpful "it depends" :-)

While foreign key constraints help data integrity, they have an associated cost on all insert, update and delete statements. Give careful attention to the use of constraints in your warehouse or ODS when you wish to ensure data integrity and validation

Dave Archer
  • 3,022
  • 20
  • 23
4

Yes, as a best practice, implement the FK constraints on your fact tables. In SQL Server, use NOCHECK. In ORACLE always use RELY DISABLE NOVALIDATE. This allows the warehouse or mart to know about the relationship, but not check it on INSERT, UPDATE, or DELETE operations. Star transformations, optimizations, etc. may not rely on the FK constraints to improve queries like they used to, but one never knows what BI or OLAP tools will be used on the front side or your warehouse or mart. Some of these tools can make use of knowing the relationships are defined. Plus, how many ugly looking warehouses have you seen with little or no external documentation and had to try to reverse engineer them? Defining the FKs always helps with that.

As designers we NEVER seem to make our data warehouses or marts as self-documenting as we should. Defining FKs certainly helps with that. Now, having said this, if star schemas are properly designed without FKs being defined, it is easy to read and understand them anyway.

And for ORACLE fact tables, always define a LOCAL BITMAP index on every FK to a dimension. Just do it. The indexing is actually more important than the FK being defined.

T. W.
  • 41
  • 1
  • A relationship with NOCHECK? That sounds like a good idea for a DW. You document the relationships as the system is designed, but you don't break ETL in the event there is an issue with the source system. – SherlockSpreadsheets Jan 28 '19 at 15:39
3

The reason for using a foreign key constraint in a data warehouse is the same as for any other database: to ensure data integrity.

It is also possible that query performance will benefit because foreign keys permit certain types of query rewrite that are not normally possible without them. Data integrity is still the main reason to use foreign keys however.

nvogel
  • 24,981
  • 1
  • 44
  • 82
1

There is a very good reason to create FK constraints in even read-only DW/DM. Yes, they are not really required from read-only DW itself point of view, if your ETL is bullet-proof, etc., etc. But guess what - the life doesn't stop at the loading data in DW. Most of the BI analytical/reporting tools are using information about your DW relationships to automatically build their model (for example SSAS Tabular model). In my humble opinion this alone outweighs the little overhead on dropping and recreating FK constraints during ETL process.