3

I have a general question about star model in business intelligence project. For example, let's suppose that the project is comprised of one fact table (F) and 3 dimensions (D1, D2, D3). Furthermore, let's assume that fact table looks like this:

d11   d21   d21        m11   m21   m21
d12   d22   d22        m12   m22   m22
d13   d23   d23        m13   m23   m23
d14   d24   d24        m14   m24   m24
d15   d25   d25        m15   m25   m25
d16   d26   d26        m16   m26   m26
d17   d27   d27        m17   m27   m27

For example - d23 = dimension nr 2, value nr 3 in the dimension (same is for measures)

Now, let's assume that a selection is made on every of 3 dimensions and that following parts of the fact table are selected:

d11   d21   d21        m11   m21
D12   d22   D22        m12   m22
D13   D23   D23        m13   m23
D14   D24   D24        m14   m24
d15   D25   D25        m15   m25
d16   d26   D26        m16   m26
d17   d27   d27        m17   m27

Now I would like to know which selections (marked with uppercase 'D') will/should/need to be in star model considered? If OUTER JOIN principle is applied, then the following will be selected:

D12   d22   D22        m12   m22
D13   D23   D23        m13   m23
D14   D24   D24        m14   m24
d15   D25   D25        m15   m25
d16   d26   D26        m16   m26

ie for the selection in the first meassure following values will be considered (m12, m13, m14, m15, m16) and for second measure (m22, m23, m24, m25, m26).

On the other hand, if INNER JOIN is between fact table and dimension tables, result will be following selection:

D13   D23   D23        m13   m23
D14   D24   D24        m14   m24

ie for first measure following values would be considered (m13, m14) and for the second measure (m23, m24) in corresponding aggregate functions.

Which of the following approaches is taking place?

Adam
  • 2,347
  • 12
  • 55
  • 81

1 Answers1

3

I cannot say that for all systems, but the standard way of selection is that you do selections on the dimension tables, which are then INNER JOINed to the fact table to filter the data, and INNER JOINed back to the dimensions where you have no selections but where columns are shown in the result.

FrankPl
  • 13,205
  • 2
  • 14
  • 40
  • Exactly the information I was looking for. Many thanks! One more question - are you familiar maybe if BusinessObjects, MS Analysis Services, and QlikView implement this principle? I presume the answer would be that it depends on how a developer joins the tables in the script/interface. However, I would like to hear your opinion. – Adam Apr 11 '14 at 07:22
  • Furthermore, it is clear to me that "you do selections on the dimension tables, which are then INNER JOINed to the fact table". However, why is then INNER JOINed back to the dimensions where there are NO selections? In my opinion, we would loose data ie the values of measures would be lower. What is the good argument that supports your claim and weakens the clame of using OUTER JOIN in the latter case? – Adam Apr 11 '14 at 07:26
  • I agree. In Qlikview you would load the fact table in the load script and in the inteface you are able to select the values of the dimension in different lists (d1, d2, d3). Selecting values in this lists would lead into a conjunction (like INNER JOIN). – smartmeta Apr 11 '14 at 07:33
  • 1
    @Adam I edited my answer, making clear that the dimensions with no selections is only done for those where column values are needed for the result. And, actually, assuming you have no `null`s (or invalid entries) in the foreign key columns of the fact table, the inner join would not remove anything from the result. – FrankPl Apr 11 '14 at 09:31