Do you really need to use the primary-foreign key pattern in a data warehouse?

Question

I work with a large datamart (2+ TB of stored data).

We have over 1000 SSIS packages that run every day. They pull data from over 100 different data sources, into our datamart. This data is then summarized and appended to our fact-tables.

We make no use of the primary-key / foreign-key SQL pattern. If a procedure is slow, we will add indexing on joined fields or where clauses. But, no use of the built-in primary-key / foreign-key.

We don't 'normalize' our data to any great extent - so there is repetition of data all over the place - and we could trim fat in that manner. But, this would make it harder for analysts to get the data they need quickly, as well as increasing the development time of making new procedures.

This seems to be working fine for us.

I was wondering if any really experienced SQL developer / DBA could comment on this, and offer some wisdom about the use of Primary-foreign keys in situations where you're not serving up data to other users, and you're loading a tremendous amount of data every day.

If your system works, good for you. Primary key/foreign key relationships would make it simpler for new users to understand the data model and how to use the database. However, what you are asking for is confirmation of your current design, which I think is a matter of opinion. — Gordon Linoff, May 28 '19 at 15:53
I do often create pk/fk relation as disable/not enforced by engine. So it will not slow the load. It really depend on what are your design patterns like starts, data vault or something else. — Piotr Palka, May 28 '19 at 16:04
A lot of purpose built databases for data warehouse/data mart use do not enforce PK/FK relationships because of performance reasons. Mostly it's up to ETL to take care of these, there are also additional benefits maintaining PK/FK in ETL. — demircioglu, May 28 '19 at 16:20
As an example when you use a lookup for a dimension key in your fact load, based on the incoming data you can classify and and assign multiple keys for data anomalies. for example 0 for no match, -1 incomplete data, -2 not available (null) etc. If PK/FK enforced on the database these rows will fail to load. Depending on use case, you might want to load them and deal with them later. — demircioglu, May 28 '19 at 16:33
A fk can be helpful when querying because it provides guidance to tools and the optimizer... but you need an index on the referencing table columns to get index-optimization. You're already doing indexes so the question is whether you want optimized query hints (probably not). — Zorkolot, May 28 '19 at 17:40
I would like to say @ColinMac that unless your business does not care about the data integrity and quality use PK-FKs and ensure they unforced and trusted (may pertain to SQL Servers only https://wateroxconsulting.com/archives/untrusted-foreign-keys/) — Arthur, May 28 '19 at 17:57
Possible duplicate of [Is it good practice to have foreign keys in a datawarehouse (relationships)?](https://stackoverflow.com/questions/2690818/is-it-good-practice-to-have-foreign-keys-in-a-datawarehouse-relationships) — philipxy, May 28 '19 at 23:48
Constraints allow the DBMS to reject certain updates to invalid database states. That is not needed for warehousing--if your warehouse is a snapshot of a constraint-enforced DB. However they also allow the DBMS to optimize querying--because it knows certain cases can't arise. What did you learn researching this? We can expect the reasons for PKs & FKs to be a faq, and almost certainly PK & FK use in warehouses to be a faq. See [ask] & the voting arrow mouseover texts. — philipxy, May 29 '19 at 00:01

Do you really need to use the primary-foreign key pattern in a data warehouse?

0 Answers0