0

We are in the process of creating a DataWarehouse in Snowflake, basically what our developers have done is by using stitch they have transferred all existing data to a Database in Snowflake and then are updating this further daily.

Now I am in the process of formatting the table structure into something legible for use with PowerBI and 3rd parties, which hasn't been a problem. I have created all the necessary SQL for the tables and or views I wish as well as inserting the existing data this new structure.

However my issue is how do I then, say, update any new data that comes along?

E.g. dev have a table called SN_BARCODE which has all the barcodes for all products in it, the fields in that table are:

   APNS (VARCHAR)
 , BARCODEID (NUMBER)
 , DATECREATED (TIMESTAMP)
 , DATEMODIFIED (TIMESTAMP)
 , DATEUPDATED (TIMESTAMP)
 , ID (NUMBER)
 , PRODUCTID (NUMBER) 
 , VENUEID (NUMBER) 
 , _SDC_BATCHED_AT(TIMESTAMP) 
 , _SDC_RECEIVE_AT (TIMESTAMP)
 , _SDC_SEQUENCE (TIMESTAMP)
 , _SDC_TABLE_VERSION (TIMESTAMP)  

to which I do the following

CREATE OR REPLACE Table pc_stitch_db.Dim_BarCodes
 (Barcodes_Id int, Barcodes_ProductId, Barcodes_APN, PRIMARY KEY (Barcodes_Id)) AS
SELECT
    "PC_STITCH_DB".sn_barcode.barcodeid AS Barcodes_Id,
    "PC_STITCH_DB".sn_product.id AS Barcodes_ProductId,
    "PC_STITCH_DB".sn_barcode.apns AS Barcodes_APN
FROM "PC_STITCH_DB".sn_Barcode
INNER JOIN "PC_STITCH_DB".sn_product  ON (
        "PC_STITCH_DB".sn_product.PRODUCTID = "PC_STITCH_DB".sn_barcode.productid
    AND "PC_STITCH_DB".sn_product.venueid = "PC_STITCH_DB".sn_barcode.venueid
);

How can I update these daily after stitch has loaded the data from the other source?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65

1 Answers1

0

You have the dilemma of simplicity vs. efficiency.

You can easily create TASKs that recreate your tables on eg. a daily schedule using statements like the CREATE OR REPLACE TABLE statement you provided above. This means your data is reloaded completely every day.

However, if you have huge and growing tables that approach eventually will cause capacity problems. Then you have to modify your solution to do incremental updates.

There are two main types of incremental updates,

  1. appending data using INSERT
  2. merging data using MERGE (in general more complex and more flexible)
Hans Henrik Eriksen
  • 2,690
  • 5
  • 12