SnowFlake-Kafka connector -> Landing Table -> Destination table. How to cleanup Landing Table

Question

I'm investigating taking data from Kafka -> SnowFlake/Kafka connector -> SnowFlake. Unfortunately, the connector seems to use just two columns (and put the entire JSON payload into a single column). So I created a stream/task to periodically copy data from the landing table to the destination table (using insert). Everything works beautifully except for deleting data in the landing table once it has landed in the destination table. Using streams, I know what has landed. How do I delete the rest of the data? Truncate seems so much faster. Do I just periodically run a delete task that deletes these entries? I am also concerned about warehouse time to perform these deletes. Thanks

Did you ever find a solution that worked for you? If so, can you share? — Looter, Dec 03 '20 at 20:46

Abhi Reddy · Answer 1 · 2020-05-13T00:49:58.797

0

For a use case where multiple statements(like insert, delete, etc) to access the same change records, surround them in explicit transaction statement(Begin..Commit) which will lock stream.

You can have an additional column like a Flag, lock stream using Begin, use the stream to insert to target table from staging, use stream to perform a second merge to staging table to mark the column Flag.

https://docs.snowflake.com/en/user-guide/streams.html#examples

begin;

select * from <stream>;

insert into target_table select columns from <stream> where metadata$action='INSERT' and flag=0;

merge into staging_table st
using (
select column
  from stream
  where flag = 0) sc
  on st.column=sc.column
  when matched then update set st.flag=1;
commit;
delete from staging_table where flag=1;

edited May 13 '20 at 00:49

answered May 12 '20 at 17:16

Abhi Reddy

467
2
6

What do I set for the conditional? I already use "when SYSTEM$STREAM_HAS_DATA('kafka_check')" for my insertion task. How do I then say delete record=0 to record = current in the underlying table where current is the record that I just consumed in the stream that tracks that table? I think my problem is how do I translate the stream state back to thee underlying table to do a efficient delete? Thanks – Surendar Chandra May 12 '20 at 22:22
Another workaround would be to stop the kafka connector, consume the stream, truncate the staging table, recreate the stream, and turn on the connector again. – Abhi Reddy May 13 '20 at 00:52
Would it work to just pause the snowpipe rather than stopping the Kafka Connector? Or does that cause errors within the Connector? If pausing works, I would see the path similar as: pause pipe -> consume data -> truncate -> recreate stream -> resume snowpipe. We are trying to solve the exact same issue. – Looter Dec 03 '20 at 20:45

SnowFlake-Kafka connector -> Landing Table -> Destination table. How to cleanup Landing Table

1 Answers1