0

I have three facts in my warehouse that can be related events in my relational db. They are PhoneContact, Appointment and Donation. A PhoneContact can result in an Appoinment and/or a Donation. I already have the Apppointment and Donation facts with their related dimensions and am now adding PhoneContact to my warehouse. The common dimension between all of these facts is the Donor dimension which describes who received the call and made the appointment and donation.

If a PhoneContact did result in an Appointment and/or Donation, I'd like to join those facts, but my understanding is that joining facts is a no-no. How would I best relate those Facts? Right now I can't think of anything better, so I'm considering putting AppointmentID and DonationID fields in my Phonecontacts fact.

More info: there are about 1.2M PhoneContacts per month but only about 100k of those result in an Appointment or Donation, so aside from not joining facts, just putting 1.1M NULLs per month into the table so I can get the 100K other events seems less than great.

kero
  • 10,647
  • 5
  • 41
  • 51
Brian B
  • 1
  • 1
  • 1
    Are rows in the `Appointment` and `Donation` tables always related to a `PhoneContact`? If so, consider flipping the problem on its head - create a FK column to `PhoneContact` from the `Appointment` and `Donation` tables. – Mike Dec 20 '13 at 05:35
  • Good question @Mike. They are not. There can be an `Appointment` or a `Donation` without a `PhoneContact`. – Brian B Dec 20 '13 at 17:02
  • Just my two cents: as with any data warehouse, I recommend presenting a model that will minimize surprises to your end users. Although `Appointment` and `Donation` rows only sometimes have a corresponding row in the `PhoneContacts` table, a sparse FK column may be the most 'natural' solution for your end users. A more normalized solution, like adding a many-to-many table, will eliminate nulls, but it does not really belong in a data warehouse. So, joining fact tables may be the lesser of of the evils you have to choose from. – Mike Dec 20 '13 at 22:06
  • You answered your own question: you join them through the Donor dimension. Regardless, you should phrase requirements of your DW in business questions. What question is being asked that requires all facts to be joined? – Nick.Mc Dec 28 '19 at 14:40

1 Answers1

1

There seems to be a trade-of here between space and performance. It seems like joining would save space. On the other hand if we used a denormalized table (already joined), we might get better performance on complicated group by queries that require scanning entire tables.

Note that joining can be less expensive in some scenarios :

  • If you your tables are sorted based on the join key, joining will be less expensive (because we will use merge join algorithm).

  • If your queries yield small num of rows (eg. give me information about John), joining will be affordable with nice indices.

If you think your use case consistently falls out of the above categories and you can easily buy more disk space, creating an already joined table can help in increasing query speed.

Harsh Verma
  • 529
  • 6
  • 10