0

I have used snowpipe to retrieve data from AWS S3 to Snowflake, but in my case, its not working as expected. Sometimes the files are not processing into snowflake.

Is there any alternate methods available for the same?

2 Answers2

2

The event handling from AWS S3 has been said to be unreliable in the way that events might arrive several minutes late (this is an AWS issue, but affects Snowpipe).

The remedy is to schedule a task to periodically (minimum daily) do:

ALTER PIPE my_pipe REFRESH [ PREFIX = '<path>' ];

Please use a prefix to avoid scanning large S3 buckets for unprocessed items. Also watch for announcements from Snowflake about when the S3 event issue is fixed by Amazon, so you can delete any unnecessary REFRESH tasks.

If you have eg. a YYYY/MM/DD/ bucket structure this unfortunately means you have to create a Stored Procedure to run the command with a dynamic PREFIX...

I use this combination (PIPE/REFRESH TASK) for my Snowpipes.

Hans Henrik Eriksen
  • 2,690
  • 5
  • 12
  • Then what are your thoughts on the Snowflake Documentation "Important" note: "The REFRESH functionality is intended for short term use to resolve specific issues when Snowpipe fails to load a subset of files and is not intended for regular use." – David Garrison Dec 13 '19 at 17:48
  • 1
    It's the same "specific issues" we are addressing here. Those issues have been present since snowpipes became available, so the "short term use" limitation is stretched beyond any reasonable meaning by now. But of course users should remove all their extra pipe refresh tasks when S3 events are guaranteed and timely. – Hans Henrik Eriksen Dec 14 '19 at 15:41
0

To answer your question: Yes. I've used it in the past on multiple occasions in production (AWS) and it has worked as expected.

Simon D
  • 5,730
  • 2
  • 17
  • 31