Questions tagged [snowflake-schema]

**Snowflake schema** is special case of the database Star Schema, where one or many dimension tables are normalized. This is often done for improving the performance in some cases of the Star Schema. When the dimension tables are completely normalized, the resulting structure resembles a snowflake with the fact tables in the middle.

Snowflake schema is a logical arrangement of tables in a multidimensional database in which the dimensions are expressed by more than one table. The dimension tables that relate to the facts table represent the smallest granularity of data-analysis. The other tables in the same dimension represent aggregators of dimension's individual elements. When a dimension is aggregated by several independent criteria, the diagram resembles the shape of a snowflake. Used in data warehouses.

Methods for creating Snowflake Schema out of Star Schema:

  • Partially or fully normalize one or many dimension tables, leaving other dimensions unchanged.
  • Partially or fully normalize all dimension tables.

When to use

Query performance takes the highest significance in a data warehouse and snowflaking can hamper the performance. It very useful for cases, when some attributes like demographic attributes are browsed more then the whole customer table itself. Another case is for precalculated summaries derived from a fact table. This helps not aggregating over and over again since the operation is expensive.

Advantages

Savings in the storage space. Normalized structures are having no insertion, deletion or update anomalies. Easier to update and maintain.

Disadvantages

Schemas are less intuitive, with higher complexity and abstraction. Difficult browsing through the contents for users. Additional joins - worse query performance in some cases.

Example

Having a fact table called "Sales" and dimension table called "Product". Assume there are 500000 product dimension rows. These products can fall under 10 brands. For running query for products, not indexed on brands, the query will have to search through 500000 rows to find all brands.

On the other hand, if the "Product" table partially normalized, separated on another table "Brand" the initial search query would need to go through just 10 rows on.

External links

893 questions
4
votes
3 answers

How to create a csv file format definition to load data into snowflake table

I have a CSV file, a sample of it looks like this: Image of CSV file Snowpipe is failing to load this CSV file with the following error: Number of columns in file (5) does not match that of the corresponding table (3), use file format option…
Venkat
  • 41
  • 1
  • 2
4
votes
1 answer

Does Snowflake has some issue with Double Precision, Float, Real Data Type

Recently I migrated data from netezza to Snowflake. The problem I am facing is with double precision data type. In netezza DB double precision field has value of .34757853258953 but when I query same in Snowflake UI I see it as .3475785326. Anyone…
rjain550
  • 105
  • 2
  • 8
4
votes
2 answers

Can Snowflake Procedure can be written in python instead of javascript?

Can Snowflake Procedure can be written in python instead of javascript? I went through Snowflake document it says javascript but was looking for options other then javascript.
4
votes
1 answer

How to connect with snowflake using sso authentication using DBT

I am trying to connect with snowflake using DBT with SSO Authentication Mechanism. For this an external browser window should open for passing the credentials. To connect with snowflake I build the profile.yml file as below test_dbt_snowflake: …
4
votes
1 answer

Flatten data source in Snowflake from Array

I am trying to fix an array in a dataset. Currently, I have a data set that has a reference number to multiple different uuids. What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each…
4
votes
2 answers

Liquibase trying to insert data into a column using incorrect data type

I am trying to wire up liquibase to be used with Snowflake. I got it to build and start up. It creates the DatabaseChangeLog and DatabaseChangeLogLock tables. But when trying to insert data into the DatabaseChangeLog table I get the following…
4
votes
2 answers

Adding row IDs to table in Snowflake SQL

I need to add a column with unique integers/hashes to a table to serve as the row id. I created a table DUPLICATES with this query: CREATE TABLE duplicates AS SELECT "a", "b", COUNT(*) as _count FROM "table" GROUP BY "a", "b" HAVING …
RadRuss
  • 484
  • 2
  • 6
  • 16
3
votes
0 answers

SNOWFLAKE - Single file generation even though the max size exceeds

Team, I have a query to discuss: Table to file creation in snowflake We have used the max file size with the compression, but at some times the data which is retrieving and size it is storing in the file getting exceeds. But we remove the…
3
votes
1 answer

Split single row value to multiple rows dynamically in Snowflake

split unstructured address into multiple rows using snowflake. consider the table col_A 4402, 4420, 4330, 4502 hecson Blvd SW 2643-2714 Nargay Matle Ct, 2685-2733 Osase Ci 4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road,…
3
votes
1 answer

Setting up Docker image with R and Snowflake Drivers

Requirement: To connect and execute the RScript with connection from R to SnowflakeDB I am trying to set up a Docker image that can communicate with a Snowflake database through R (either using RODBC or ODBC) Error: The problem seems to be that It…
3
votes
1 answer

Just wanted to check if these two queries give same result or different result

Query 1: SELECT id,COUNT(X.C_NO) as count FROM table X WHERE X.date = '2022-02-02' and P_NO is not null group by id; Query 2: SELECT id, sum(CASE WHEN C_NO IS NOT NULL and P_NO is not null THEN 1 ELSE 0 END) as count …
Jenifer
  • 387
  • 2
  • 11
3
votes
4 answers

COPY INTO in snowflake throws table does not exist
I am trying to copy data to snowflake from local and I am getting snowflake.connector.errors.ProgrammingError: 001757 (42601): SQL compilation error: Table 'RAW_DATA' does not exist the same code is working in Jupiter notebook but it doesn't work…
3
votes
1 answer

Write alias for columns with same names from multiple pivots in snowflake

My query is SELECT * FROM ( WITH as1 AS ( SELECT DISTINCT a.COMPANY_NAME, b.industry_sector, b.INDUSTRY_GROUP, c.VERTICAL, D.VCROUND, D.VCROUND AS VCROUND1, D.VCROUND AS VCROUND2, D.VCROUND AS VCROUND3, …
Xi12
  • 939
  • 2
  • 14
  • 27
3
votes
1 answer

How to grant access to PROCEDURE_DEFINITION in Snowflake from ADMIN role to another role?

I have created a stored procedure 'XYZ' in "ADMIN" schema by using the role "ADMIN_ROLE" and have granted a "USAGE" permission from "ADMIN" account to another role which is "ABC_ROLE". Statement used: GRANT USAGE ON ALL PROCEDURES IN SCHEMA…
3
votes
1 answer

How to return result from a stored procedure to TASK RETURN_VALUE in snowflake?

I would like to return logging and status messages from a stored procedure to the TASK that calls it. create or replace procedure status_return() returns string not null language javascript as $$ var result_status = 'The return status…
1
2
3
59 60