2

I am planning to use Snowpipe to load data from Kafka, but the support team monitoring the pipe jobs needs an alert mechanism.

How can I implement an alert mechanism for Snowpipe via email/slack/etc?

Marco Roy
  • 4,004
  • 7
  • 34
  • 50

4 Answers4

2

The interface provided by Snowflake between the database and surroundings is mainly with cloud storage. There is no out-of-the-box integration with messaging apart from cloud storage events.

All other integration and messaging must be provided by client solutions.

Snowflake also provides scheduled tasks that can be used for monitoring purposes, but the interface limitations are the same as described above.

Snowflake is database as a service and relies on other (external) cloud services for a complete systems solution.

This is different from installing your own copy of database software on your own compute resource, where you can install any software alongside with the database.

Hans Henrik Eriksen
  • 2,690
  • 5
  • 12
2

Please correct my understanding if anything I say is incorrect. I believe Snowpipe is great for continuous data loading but it is hard or no way to track all the errors in the source file. As mentioned in the previous suggestions, we could build a visualization querying against COPY_HISTORY and/or PIPE_USAGE HISTORY but it doesn't give you ALL THE ERRORS in the source file. It only tells you these related to the errors

enter image description here

PIPE_USAGE HISTORY will tell you nothing about the errors in the source file.

The only function that can be helpful (for returning all errors) is the VALIDATE table function in the Information_Schema but it only validates for COPY_INTO.

There is a similar function for PIPE called VALIDATE_PIPE_LOAD but according to the documentation it returns only the first error. Snowflake says "This function returns details about ANY errors encountered during an attempted data load into Snowflake tables." But the output column ERROR says only the first error in the source file.

So here is my question. If any of you guys have successfully Snowpipe to load in real-time production environment how are you doing the error handling and alerting mechanism?

I think as compared to Snowpipe, using COPY_INTO within a Stored Procedure and have shell script calling this Stored procedure and then scheduling this script to run using any Enterprise Scheduler like Autosys/Control-m is a much streamlined solution.

Using External functions, Stream and Task for alerting is an elegant solution maybe but again I am not sure if solves the problem of error-tracking.

Marco Roy
  • 4,004
  • 7
  • 34
  • 50
ajcoder
  • 195
  • 1
  • 2
  • 12
  • 1
    Preferably, instead of taking a screenshot of text, please post the information as text, and then use Markdown to format it. You can even create tables using Markdown. – Jeremy Caney Sep 25 '21 at 00:13
  • Alerting and diagnostic/debugging are two separate activities. Don't conflate the two. `COPY_HISTORY` will let you know whether or not there has been an error. You can then diagnose/debug this error yourself by looking at the data in your file, or by running manual `COPY INTO` commands. – Marco Roy Apr 27 '22 at 22:42
  • Also, as you mentioned, you don't need to use Snowpipe -- but it is much cheaper than running `COPY INTO` commands, because the latter requires an active warehouse to load the data. Snowpipe doesn't, and hence can ingest data at a fraction of the cost. – Marco Roy Apr 27 '22 at 22:43
1

EDIT (2022-04-27): Snowflake now officially supports Error Notifications for Snowpipe (currently in Public Preview, for AWS only).


"Monitoring" & "alert mechanism" are a very broad terms. What do you want to monitor? What should be triggering the alerts? The answer can only be as good as the question, so adding more details would be helpful.

As Hans mentioned in his answer, any solution would require the use of systems external to Snowflake. However, Snowflake can be the source of the alerts by leveraging external functions or notification integrations.

Here are some options:

If you want to monitor Snowpipe's usage or performance:

If you want to be alerted about data loading issues:

  • You could create a data test against COPY_HISTORY in DBT, and schedule it to run on a regular basis in DBT Cloud.
  • Alternatively, you could create a task that calls a procedure on a schedule. Your procedure would check COPY_HISTORY first, then call an external function to report failures.
Some notes about COPY_HISTORY:
  • Please be aware of the limitations described in the documentation (in terms of the privileges required, etc.)
  • Because COPY_HISTORY is an INFORMATION_SCHEMA function, it can only operate on one database at a time.
    • To query multiple databases at once, UNION could be used to combine the results.
  • COPY_HISTORY can be used for alerting only, not diagnostic. Diagnosing data load errors is another topic entirely (the VALIDATE_PIPE_LOAD function is probably a good place to start).

If you want to be immediately notified of every successful data load performed by Snowpipe:

  • Create an external function to send notifications/alerts to your service(s) of choice.
  • Create a stream on the table that Snowpipe loads into.
  • Add a task that runs every minute, but only when the stream contains data, and have it call your external function to send out the alerts/notifications.
  • EDIT: This solution does not provide alerting for errors - only for successful data loads! To send alerts for errors, see the solutions above ("If you want to be alerted about data loading issues").
Marco Roy
  • 4,004
  • 7
  • 34
  • 50
  • question on the solution given for immediate notification: in case if load fails then stream (on load table) wont have data then how will the task get call? Is there any document or link where we can have elaborated steps of how to do this.. thanks in advance – danD Nov 03 '21 at 02:57
  • The immediate notifications will only notify you of successful data loads. Failed data load wouldn't insert any data into the table, hence the task wouldn't get triggered. Documentation is in the provided links. You can also take a look at the links @Allen provided in his answer. – Marco Roy Nov 03 '21 at 22:56
  • Please correct me If I am wrong? I am thinking to put stream on Copy_history table instead of putting into load table, and in task do filter with where user is snowpipe and status is failed then call the external function? in that case I can report the job failure. I believe only downsize will be task will run more time than usual as it will do filter for all loads. – danD Nov 04 '21 at 05:58
  • For something like that, you don't even need a stream. Just create a task that calls a procedure on a schedule. Your procedure can check `COPY_HISTORY` first, then call the external function if needed to report a failure. – Marco Roy Apr 27 '22 at 22:07
1

Both email and Slack alerts can be implemented via external functions.

Marco Roy
  • 4,004
  • 7
  • 34
  • 50
Allen
  • 406
  • 2
  • 8