Questions tagged [snowflake-schema]

**Snowflake schema** is special case of the database Star Schema, where one or many dimension tables are normalized. This is often done for improving the performance in some cases of the Star Schema. When the dimension tables are completely normalized, the resulting structure resembles a snowflake with the fact tables in the middle.

Snowflake schema is a logical arrangement of tables in a multidimensional database in which the dimensions are expressed by more than one table. The dimension tables that relate to the facts table represent the smallest granularity of data-analysis. The other tables in the same dimension represent aggregators of dimension's individual elements. When a dimension is aggregated by several independent criteria, the diagram resembles the shape of a snowflake. Used in data warehouses.

Methods for creating Snowflake Schema out of Star Schema:

  • Partially or fully normalize one or many dimension tables, leaving other dimensions unchanged.
  • Partially or fully normalize all dimension tables.

When to use

Query performance takes the highest significance in a data warehouse and snowflaking can hamper the performance. It very useful for cases, when some attributes like demographic attributes are browsed more then the whole customer table itself. Another case is for precalculated summaries derived from a fact table. This helps not aggregating over and over again since the operation is expensive.

Advantages

Savings in the storage space. Normalized structures are having no insertion, deletion or update anomalies. Easier to update and maintain.

Disadvantages

Schemas are less intuitive, with higher complexity and abstraction. Difficult browsing through the contents for users. Additional joins - worse query performance in some cases.

Example

Having a fact table called "Sales" and dimension table called "Product". Assume there are 500000 product dimension rows. These products can fall under 10 brands. For running query for products, not indexed on brands, the query will have to search through 500000 rows to find all brands.

On the other hand, if the "Product" table partially normalized, separated on another table "Brand" the initial search query would need to go through just 10 rows on.

External links

893 questions
0
votes
0 answers

How to update effective dates for historic data while loading daily data

I am having issue with updating effective end_date for products in scd type 2 where daily current_flag row have the same pack_size and product_code. Current statement does not handle it and it skips the row as there is no match. My example product…
0
votes
1 answer

Snowflake updating duplicate records

I am having staging tables in a snowflake where I am copying data from AWS S3 using snowpipe. Some of those records are a type of creating an event and multiple updates. For same event, there will be one create and multiple update events with…
Dany
  • 2,034
  • 8
  • 34
  • 54
0
votes
2 answers

SNOWPIPE in snowflake is 100% reliable or not?

I have used snowpipe to retrieve data from AWS S3 to Snowflake, but in my case, its not working as expected. Sometimes the files are not processing into snowflake. Is there any alternate methods available for the same?
0
votes
5 answers

Is it possible to run a snowflake sql script without qualifying the database name.?

Is it possible to run a SQL script without qualifying the database name? Currently, we are using the program like below, SELECT I.XXXX_LOC_SKEY, C.COUNTRY_SKEY FROM DEV_XXXX_DB.STAGING.XX_TABLE_LOCATION_SALES …
0
votes
3 answers

How to do I get the max day of each month in sql/snowflake

I’m trying to return The max day of each month from a data set. I say max because it’s based on mon-fri uploads. So if the last day of the month falls on a Saturday(say 10/31/2019) then the last uploaded date will be on the Friday (10/30/2019).…
Ted Mosby
  • 1,426
  • 1
  • 16
  • 41
0
votes
2 answers

SnowFlake Query if an Id exists on each of the last 7 days

We INSERT new records every day to a table with say id and created_on column. How do i identify if records with a particular identifier existed every day in the last 7 days ?
0
votes
2 answers

Trouble concatenating rows into string with multiple joins

I am struggling to create a statement in a snowflake schema. I need to show a list of all products and associated tracks but also include on each line a list of songwriters names and ownership percentages eg. add one column that looks like: "Sam…
0
votes
3 answers

Snowflake Windows Analytical Function to set grouping set

I have following data set for data lake which is acting as source for Dimension where I want to migrate the history data in Dimension For e.g.: image Primarykey Checksum DateFrom Dateto ActiveFlag 1 11 …
0
votes
1 answer

Why is the PLAYFAB_SHARED database no longer available?

I don't have much experience with Snowflake and Playfab, but this is what happened. Connected Playfab and Snowflake to get the Playfab data into the Snowflake console. Everything was working good, I created an additional database, created a schema…
0
votes
1 answer

Snowflake External table requires postional columns issue

I am facing an issue where the extracted files columns could be changing for differents: for example: On day 1, file might have 3 columns : c1,c2,c3 On day 2, file might have 5 columns : c1,c3,c2,c4,c5 notice the column position of c3 in the…
0
votes
2 answers

Data Migration from one DB to another

I have to create an app which transfer data from snowflake to postgres everyday. Some tables in postgres are truncated before migration and all data from corresponding snowflake table is copied. While for other tables, data after last timestamp in…
Saurabh Bhatia
  • 679
  • 2
  • 7
  • 11
0
votes
1 answer

Pivot object not being recognized by snowflake

I have written a SQL code in snowflake to create a pivot table. I am using the object of the pivot table in another query but its not being recognized. create or replace table "SCRATCH"."ASHISHKUMAR".pivoted2 as SELECT * FROM (SELECT …
0
votes
1 answer

Snowflake failover - replication mechanim

I have been diving into the documentation about how to manage the failover/replication use cases within Snowflake. here Basically, the failover strategy is based on the replication feature of a database between two different accounts of the same…
0
votes
2 answers

The date functions in snowflake doesn't seem to work as expected

I can't understand why SELECT YEAROFWEEK('2017-01-01T00:00:00.000+00:00'::timestamp) returns "2016" SELECT WEEK('2017-01-01T00:00:00.000+00:00'::timestamp) returns "52" Can someone help me understand?
Baaju
  • 1,992
  • 2
  • 18
  • 22
0
votes
3 answers

Snowflake - how to display column name using function?

I did not get much help from Snowflake documentation about how I can take give column names using snowflake functions. I have automated report which will do calculation for the given dates. here it is select sum(case when logdate =…
zealous
  • 7,336
  • 4
  • 16
  • 36