0

I wonder why fact tables are bigger in size than dimension tables in data warehouses. Dimension tables contain the attribute-level information, and are highly de-normalized, so why are dimension tables not bigger in size ?

cmaher
  • 5,100
  • 1
  • 22
  • 34
Arzu
  • 46
  • 3
  • 6
  • Facts can get bigger over time since all historical transactional data will be put into the fact table. Dimensions could expand as well, but usually not as fast or not all of them. – tobi6 Feb 09 '18 at 09:30
  • @tobi6 - Agreed but dimensions has the entity level data so ideally it would be having more information than Facts. Say- One merchant has business of manufacturing bricks so average number of bricks are generated by a person would store in Fact and the detailed information of that person like - email id, address, phone number etc will be present in Dimension so ideally it should be bigger in size. – Arzu Feb 09 '18 at 11:23
  • For me bigger in size is directly connected to the amount of rows in a table. It seems you think that a bigger size is the amount of columns. This might be a question of definition. – tobi6 Feb 09 '18 at 11:36

3 Answers3

0

I could start by stealing some words off Kimball

"Dimensional modeling begins by dividing the world into measurements and context." https://www.kimballgroup.com/2003/01/fact-tables-and-dimension-tables/

Fact tables record business activities or events and for that reason fact tables could grow in size. Dim Tables store information on different contexts.

For eg: In an university 100 students might be enrolling in 10 subjects. Now if you see the dims, Dim_Student and Dim_Subject, in this scenario they might have 100 rows and 10 rows each. But the activity of enrolments will be much more, as students can enrol into 0 or many subjects at the same time. This could lead to the Fact_Enrolment(which records the enrolment activities) table having lot more rows when compared to the dims.

Note: However in my experience I have also worked with facts where the fact tables have lesser rows when compared to the dims, at a particular point in time. They might grow in size eventually when the DataWarehouse grows.

Hope that helps.

NITHIN B
  • 214
  • 1
  • 9
  • Thanks. Even I have worked with facts where they are lesser than Dim this is why I am confused but your example clarifies that Fact can be bigger over the time. Thanks for the help. +1 for example. – Arzu Feb 12 '18 at 08:58
0

Dimensions contain entity level information whereas facts contain transaction level information and for a dimension multiple transaction can take place over a period of time. For example, in a HR system, there can be a person dimension containing personal details of all the employees wherein typically there may be 1-3 records for each employee.

Fact tables will store multiple transactions of the employees e.g., hires, promotion. movement/change of departments, leaves Termination etc. so corresponding to one-person record in person dimension there will be multiple records in facts.

Also Fact Tables contains facts / measures corresponding to multiple dimensions And so facts are joined with multiple dimensions using a surrogate key/ foreign key reference to different dimensions which makes the fact table heavier than dimensions.

Kobi Lehrer
  • 151
  • 5
-1

Dimension tables contains the attribute level information and highly de-normalized

Actually, I doubt that dimension tables are "highly de-normalized". Generally speaking, each row in a dimension table is identified by a primary key so there is very less scope of having duplicates in them. This can explain why they do not get too big in size compared to fact tables.

clocksp33d
  • 44
  • 5