1

I am using the following query to populate my fact table:

Select sh.isbn_l,sh.id_c,sh.id_s, sh.data,sh.quantity, b.price
 from Book as b
inner join Sales as sh
on l.isbn=sh.isbn_l

The main thing is that I want to load the table from a specific time to a specific time. So if I load today, I will get all the records from today till the last time I loaded.

And if I load it the day after tomorrow, I will get the datas from today after load time, till the day after tomorrow.

What I mean is NO DUBLICATED ROWS or DATAS. What should I do ? Any idea pleasee ?

Thank you in advance

Joe King
  • 2,955
  • 7
  • 29
  • 43
Klara
  • 11
  • 3
  • I've never used snowflake but I assume it's like every other specialised datawarehouse database and _does not have_ any magic that does this for you. You need to create a control table that remembers the last time you loaded. Having said that there does not appear to be any column in your data that indicates this. – Nick.Mc Jun 26 '19 at 00:15
  • 1
    Well, Snowflake _does_ have some magic here, that migth be able to help: https://docs.snowflake.net/manuals/user-guide/streams.html :D – Marcin Zukowski Jun 28 '19 at 01:50

1 Answers1

0

Streams (and maybe Tasks) are your friend here.

A Snowflake Stream records the delta of change data capture (CDC) information for a table (such as a staging table), including inserts and other DML changes. A stream allows querying and consuming a set of changes to a table, at the row level, between two transactional points of time.

In a continuous data pipeline, table streams record when staging tables and any downstream tables are populated with data from business applications using continuous data loading and are ready for further processing using SQL statements.

Snowflake Tasks may optionally use table streams to provide a convenient way to continuously process new or changed data. A task can transform new or changed rows that a stream surfaces. Each time a task is scheduled to run, it can verify whether a stream contains change data for a table (using SYSTEM$STREAM_HAS_DATA) and either consume the change data or skip the current run if no change data exists.

Users can define a simple tree-like structure of tasks that executes consecutive SQL statements to process data and move it to various destination tables.

https://docs.snowflake.com/en/user-guide/data-pipelines-intro.html

Robert Long
  • 5,722
  • 5
  • 29
  • 50