Questions tagged [snowflake-schema]

**Snowflake schema** is special case of the database Star Schema, where one or many dimension tables are normalized. This is often done for improving the performance in some cases of the Star Schema. When the dimension tables are completely normalized, the resulting structure resembles a snowflake with the fact tables in the middle.

Snowflake schema is a logical arrangement of tables in a multidimensional database in which the dimensions are expressed by more than one table. The dimension tables that relate to the facts table represent the smallest granularity of data-analysis. The other tables in the same dimension represent aggregators of dimension's individual elements. When a dimension is aggregated by several independent criteria, the diagram resembles the shape of a snowflake. Used in data warehouses.

Methods for creating Snowflake Schema out of Star Schema:

  • Partially or fully normalize one or many dimension tables, leaving other dimensions unchanged.
  • Partially or fully normalize all dimension tables.

When to use

Query performance takes the highest significance in a data warehouse and snowflaking can hamper the performance. It very useful for cases, when some attributes like demographic attributes are browsed more then the whole customer table itself. Another case is for precalculated summaries derived from a fact table. This helps not aggregating over and over again since the operation is expensive.

Advantages

Savings in the storage space. Normalized structures are having no insertion, deletion or update anomalies. Easier to update and maintain.

Disadvantages

Schemas are less intuitive, with higher complexity and abstraction. Difficult browsing through the contents for users. Additional joins - worse query performance in some cases.

Example

Having a fact table called "Sales" and dimension table called "Product". Assume there are 500000 product dimension rows. These products can fall under 10 brands. For running query for products, not indexed on brands, the query will have to search through 500000 rows to find all brands.

On the other hand, if the "Product" table partially normalized, separated on another table "Brand" the initial search query would need to go through just 10 rows on.

External links

893 questions
0
votes
1 answer

Data Warehousing - Snowflake Schema Normalization

To start, I am trying to differentiate from Star Schema and Snowflake Schema by illustrating them. But am having trouble trying to normalizing the table to create the snowflake schema. The attached image is the Star Schema enter image description…
AdrianAndrews
  • 33
  • 1
  • 6
0
votes
2 answers

Data Warehouse schema design - how to improve schema model

I have to create Data Warehouse for travel agency. I'm doing it for the first time. I've learned all the basics about star, snowflake and constellation schema and about creating data warhouses. I would like to ask what could be changed for better…
0
votes
2 answers

valid_from/valid_to against active_status in dimension table

In order to populate a SCD2 dimension table, a marker to note the lastest active row is always beneficial. There are two ways I can think of 1) valid_from/valid_to 2) active_status: active/deleted It is clear that valid_from/valid_to keeps more…
Hello lad
  • 17,344
  • 46
  • 127
  • 200
0
votes
1 answer

Star schema: how to handle dimension table with constantly changing set of columns?

First project using star schema, still in planning stage. We would appreciate any thoughts and advice on the following problem. We have a dimension table for "product features used", and the set of features grows and changes over time. Because of…
0
votes
1 answer

Star Schema: How the fact table aggregations are performed?

https://web.stanford.edu/dept/itss/docs/oracle/10g/olap.101/b10333/globdiag.gif Assume that we have a start schema as above.. My questions is - In real-time how do we populate the colums (unit_price, unit_cost) columns of the fact table..? Can…
AKC
  • 953
  • 4
  • 17
  • 46
0
votes
1 answer

Country and Customer Dimension

I hesitate whether I should add a Country_Dimension or not since I already have a Customer_Dimension which contain some redundant fields such as: continent_name country_name postcode_#
0
votes
0 answers

Star Schema design

I have a question in regards to star schema design, whether I need to use snowflake (which I read should be avoided). I have the following three dimension tables: Main list dim. - contains list of people Sub lists dim. - contains all sorts of…
karel lahmy
  • 397
  • 1
  • 5
  • 15
0
votes
1 answer

Is it possible to have conditional OLAP dimension aggregators?

I have a set of OLAP cubes, in the form of a snow-flake schema, each representing one factory. I have three concepts that for some factories clearly behave as 3 dimensions, and for other factories clearly behave as 2 dimensions. The concepts are…
0
votes
1 answer

Star Schema Design for User Utilization Reports

Scenario: There are 3 kinds of utilization metrics that i have derive for the users. In my application, users activity are tracked using his login history, number of customer calls made by the user, number of status changes performed by user. All…
0
votes
2 answers

Relational Integrity between OLAP and OLTP

I've been reviewing a client's architecture, particularly their OLAP system, which is just a regular old snowflake schema on SQL Server. The facts and dimensions are ETL'd in from other transactional systems such as ERP. One thing that jumped out at…
João Bragança
  • 1,353
  • 1
  • 13
  • 29
0
votes
2 answers

Exist a sample or template database for BI about sales/inventory?

I will start a reporting tool for some local ERPs about sales, invoices, orders, customers, inventory, etc. I wonder if exist a sample schema appropriated for this case (in star or snowflake form) so I can have a solid foundation. I have…
mamcx
  • 15,916
  • 26
  • 101
  • 189
0
votes
1 answer

Snowflake schema dimension

This is the first time i'm working on a BI project and Pentaho products are not yet familiar to me, so i needed to know if the following models are correct and that i won't face difficulties later when creating hierarchies on the BI Server ! Thank…
0
votes
1 answer

Normalizing Time Series Data

I'm creating a database to store a lot of events. There will be a lot of them and they will each have an associated time that is precise to the second. As an example, something like this: Event ----- Timestamp ActionType (FK) Source (FK) Target…
polm23
  • 14,456
  • 7
  • 35
  • 59
0
votes
2 answers

Koch snowflake implementation in python using pygame

I would like to implement Koch Koch snow flake using pygame. I am working with the following series of images from http://en.wikipedia.org/wiki/File:KochFlake.svg My algorithm is this Draw a triangle Calculate the points of a triangle one-third…
Hacker
  • 133
  • 5
  • 16
0
votes
1 answer

How can i write a dimension with snow flake schema ? . Details below

There are three tables - 1) Student - My fact Table (References Addresses with FK ADDRESS_ID 2) Addresses - This table Contains FK COUNTRY_ID references COUNTRY 3) Country - this has a NAME COLUMN which i would display. (PK IDENTIFIER) I have…
Rajeev A N
  • 105
  • 3
  • 9