Questions tagged [snowflake-schema]

**Snowflake schema** is special case of the database Star Schema, where one or many dimension tables are normalized. This is often done for improving the performance in some cases of the Star Schema. When the dimension tables are completely normalized, the resulting structure resembles a snowflake with the fact tables in the middle.

Snowflake schema is a logical arrangement of tables in a multidimensional database in which the dimensions are expressed by more than one table. The dimension tables that relate to the facts table represent the smallest granularity of data-analysis. The other tables in the same dimension represent aggregators of dimension's individual elements. When a dimension is aggregated by several independent criteria, the diagram resembles the shape of a snowflake. Used in data warehouses.

Methods for creating Snowflake Schema out of Star Schema:

  • Partially or fully normalize one or many dimension tables, leaving other dimensions unchanged.
  • Partially or fully normalize all dimension tables.

When to use

Query performance takes the highest significance in a data warehouse and snowflaking can hamper the performance. It very useful for cases, when some attributes like demographic attributes are browsed more then the whole customer table itself. Another case is for precalculated summaries derived from a fact table. This helps not aggregating over and over again since the operation is expensive.

Advantages

Savings in the storage space. Normalized structures are having no insertion, deletion or update anomalies. Easier to update and maintain.

Disadvantages

Schemas are less intuitive, with higher complexity and abstraction. Difficult browsing through the contents for users. Additional joins - worse query performance in some cases.

Example

Having a fact table called "Sales" and dimension table called "Product". Assume there are 500000 product dimension rows. These products can fall under 10 brands. For running query for products, not indexed on brands, the query will have to search through 500000 rows to find all brands.

On the other hand, if the "Product" table partially normalized, separated on another table "Brand" the initial search query would need to go through just 10 rows on.

External links

893 questions
3
votes
2 answers

DBT to add comments to snowflakes columns

We use DBT for ELT in snowflake. Want to add comments to each column in Snowflake. Either using COMMENT or ALTER command after every full refresh. Decided to add macros with commands and call it under on-run-end hook. {​​​​​​​% macro…
Kavya shree
  • 312
  • 1
  • 7
  • 24
3
votes
2 answers

Calculate cumulative percentiles using SQL for a group/partition

I want to calculate the cumulative percentiles for a given partition/group in SQL. For example the input data looks like - CustID Product ID quantity_purchased 1 111 2 2 111 3 3 …
User
  • 57
  • 6
3
votes
1 answer

Create date array from start date and end date columns in Snowflake

I want to create an array of dates from two columns in my table in Snowflake. In the example below, how can I add a third columns that contains an array of all dates between start_date and end_date? It can work using a UDF but I do not have…
3
votes
1 answer

Way to automatically create SnowFlake table based on inferred field types from API Endpoint? (Python)

Say I have a dataframe that has a row like: {'ID':'123245','Comment':'This is my longer comment','Tax':1.07,'Units':2.0} Is there a way in Python to do something like: max([len(str(i)) for i in set(df['Comments'])]) And infer the max varchar and…
3
votes
2 answers

Error while reading data, error message: CSV table references column position 174, but line starting at position:136868 contains only 94 columns

I'm trying to unload data from Snowflake to GCS and then GCS to bq table. Here is the Code for unloading data from snowflakes. ```copy into @unload_stage/FF_TBL_UNLOAD20200906.csv.gz from ( select * from SF_DB.SF_TBLS.FF_TBL_UNLOAD ) file_format =…
3
votes
3 answers

I cannot seem to get pivot to work in SQL Snowflake

So I can't figure out how to get tables to pivot in Snowflake. I'm trying to model the help page here https://docs.snowflake.com/en/sql-reference/constructs/pivot.html. In this example, I'm only pulling 3 columns, the step, the parameter being…
3
votes
1 answer

How to run snowflake side effect functions like SYSTEM$GENERATE_SCIM_ACCESS_TOKEN within a procedure with owner rights?

Basically I want to do SCIM integration in snowflake. For that I have to use this command for getting the token which will be passed to Azure AD: call system$generate_scim_access_token(''); This command can only run with AccountAdmin. And…
3
votes
1 answer

What is the point of Snowflake's Unique constraint?

Snowflake offers a Unique constraint but doesn't actually enforce it. I have an example below showing that with a test table. What is the point, what value does the constraint add? What workarounds do people use to avoid duplicates? I can perform a…
jacobvoller.com
  • 476
  • 7
  • 28
3
votes
1 answer

Alter column set default unsupported feature

I want to alter the table and set the default sequence of a column which is identity. When I try to run ALTER TABLE report.test_table MODIFY id set default test_table_seq.NEXTVAL; it shows following error: [0A000][2] Unsupported feature 'Alter…
3
votes
1 answer

Snowflake - add sequence to an existing column

I have to migrate a SQL server database to Snowflake, most of my dimension tables have an identity column as PK, these columns are then referenced across multiple facts tables. I am planning on copying these tables in snowflake however I need to…
Bastien
  • 41
  • 1
  • 3
3
votes
2 answers

GRANT EXECUTE permission to ALL STORED PROCEDURES in snowflake

GRANT EXECUTE permission to ALL STORED PROCEDURES in snowflake. I have create a stored procedure in the snowflake database but I am getting error while trying to execute that stored procedure. create or replace procedure…
Jeyavel
  • 2,974
  • 10
  • 38
  • 48
3
votes
2 answers

Can Snowflake Auto Purge records older than X number of days?

For an existing table in snowflake is there a way we can set TTL for each record ? In other words can i ensure records updated/created more than 90 days ago is automatically purged periodically.
Vinod Jayachandran
  • 3,726
  • 8
  • 51
  • 88
3
votes
1 answer

Dimensional Modeling - Common Attribute Used In Various Dimensions Compositie Keys

I have a situation here that I have not faced before. I have multiple instances of the same ERP system, differing by satellite locale. Each locale is assigned their own ID. Within each satellite location, the DB schema is the same as the others,…
3
votes
2 answers

Data Warehouse: Working with accumulated data

Our data warehouse uses accumulated data (and there is no way to invert the accumulation) from a data source to create a snowflake schema. A requirement we have to cope with is that our schema must be usable to create reports based on date…
2
votes
3 answers

To Calculate Value based on the stock available On Snowflake

I have the below table ID Week_No Value Available_Stock 1 1 200 1000 1 2 300 1000 1 3 100 1000 1 4 400 1000 1 5 500 1000 I want a table like this ID SUM(VALUE) Stock_Lasts_Weeks Available_Stock 1 800 4 Weeks 1000 As…
1 2
3
59 60