0

Currently I have a snowflake table being updated from a kafka connector in near-realtime, I want to be able to then in near-real time take these new data entries through something such as snowflake cdc / snowflake streams and append some additional fields. Some of these will be to track max values within a certain time period (window function probs) and others will be to receive values from static tables based on where static_table.id = realtime_table.id.

The final goal is to perform these transformations and transfer them to a new presentation level table, so I have both a source table and a presentation level table, with little latency between the two.

Is this possible with Snowflake Streams? Or is there a combination of tools snowflake offers that can be used to achieve this goal? Due to a number of outside constraints it is important that this can be done within the snowflake infrastructure.

Any help would be much appreciated :).

I have considered the use of a materialised view, but am concerned regarding costs / latency.

1 Answers1

0

The goal of Streams - together with Tasks - is to get the transformations done that you are asking for.

This is a quickstart to start growing you Stream and Tasks abilities:

On the 6th step you can see a task that would transform the data as it arrives:

create or replace task REFINE_TASK
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
SCHEDULE = '4 minute'
COMMENT = '2.  ELT Process New Transactions in Landing/Staging Table into a more Normalized/Refined Table (flattens JSON payloads)'
when
SYSTEM$STREAM_HAS_DATA('CC_TRANS_STAGING_VIEW_STREAM')
as
insert into CC_TRANS_ALL (select                     
card_id, merchant_id, transaction_id, amount, currency, approved, type, timestamp
from CC_TRANS_STAGING_VIEW_STREAM);
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325