Star Schema: How the fact table aggregations are performed?

Question

https://web.stanford.edu/dept/itss/docs/oracle/10g/olap.101/b10333/globdiag.gif

Assume that we have a start schema as above..

My questions is - In real-time how do we populate the colums (unit_price, unit_cost) columns of the fact table..?

Can anyone provide me a start schema tables with real data?

I am having hard time in understanding star schema...

Please help!..

score 0 · Answer 1 · answered Nov 15 '15 at 18:45

Start schema consists of two types of tables fact tables and dimensions.

The ideal of the star design is that you can split your data in two part. The static part is described with dimensions and the dynamic part (= transactions) in the fact table.

Each transaction is stored in the fact table as a new record and is connected to the surrounding dimensions, that define the context of the transaction.

The example in link contains two fact tables: SHIPMENTS and PRODUCT_CONDITIONS. Note that the fact tables in the link are dubbed UNITS_HISTORY_FACT and PRICE_AND_COST_HISTORY_FACT, but I find this not a best choice.

The SHIPMENTS table stores one record for each shipment of a PRODUCT to a CUSTOMER at some TIME, via a defined CHANNEL. All the above information is defined using the corresponding keys of the respective dimensions. The fact table also contains MEASURES describing the attributes of the transaction, here the number of UNITS shipped.

The structure of the fact table would be therefore

CUSTOMER_ID
PRODUCT_ID
TIME_ID
CHANNEL_ID
UNITS

The second fact table (bottom) is more interesting, because here you split the product in two parts:

PRODUCT dimension defining the ID, name and other more static attributes

PRODUCT_CONDITION this is fact table, designed with the expectation the price and cost of the product will change over time. With each change of the price or cost insert a new record in the fact table and connect it to the PRODUCT and TIME (of change).

The structure of the fact table would be therefore

PRODUCT_ID
TIME_ID
UNIT_PRICE
UNIT_COST

Final note the the design of the TIME dimension.

The best practice to connect the fact table with the dimension tables is to use meaningless ID (surrogate keys), but for TIME dimension you should be careful. For big (time partitioned) fact table is often used the natural key (DATE format) to be able to deploy the partitioning features. See more details in How I Defined a Time Dimension Using a Surrogate Key and other resources in web.

Star Schema: How the fact table aggregations are performed?

1 Answers1