1

I want to execute a SQL script on a schedule in Snowflake. There are several CTEs and a JavaScript UDF in my query.

My idea is to

  1. Table the query of the result.
  2. Create one stored procedure
  3. Create a task to execute the procedure

How can I create one stored procedure for the entire query and update the result set?

Example data:

WITH dataset AS (
select $1 id, $2 status, $3 created_at, $4 completed_at
from values
  ('A','created','2021-07-15 10:30:00'::timestamp, NULL), ('A','missing_info','2021-07-15 11:10:00'::timestamp,NULL),
  ('A','pending','2021-07-15 12:05:00'::timestamp, NULL), ('A','successful','2021-07-15 16:05:00'::timestamp,'2021-07-15 17:05:00'::timestamp),
  ('B','created','2021-07-16 11:30:00'::timestamp, NULL), ('B',NULL,'2021-07-16 11:30:00'::timestamp, NULL),
  ('B','successful','2021-07-16 12:30:00'::timestamp, '2021-07-16 16:30:00'::timestamp) )

UDF to calculate timediff:

create or replace function tsrange_intersection(s string, e string)
RETURNS double
LANGUAGE JAVASCRIPT
AS
$$
let minutes = 0
start = new Date(S)
end = new Date(E)
let t = start
while(t < end) {
    if ([1, 2, 3, 4, 5].includes(t.getDay())
      && [9, 10, 11, 12, 13, 14, 15, 16].includes(t.getHours())) {
        minutes += 1
    }
    t = new Date(t.getTime() + 60*1000);
}
return minutes
$$;

The query:

-- simple filtering
cte1 AS (
SELECT *
FROM dataset
WHERE id IS NOT NULL AND status IS NOT NULL 
ORDER BY created_at ),

-- retrieve only the id's which started from 'created'
cte2 AS (
SELECT *
FROM cte1
QUALIFY FIRST_VALUE(status) OVER (PARTITION BY id ORDER BY created_at) = 'created' )

-- pattern match
SELECT *
FROM cte2
MATCH_RECOGNIZE (
    PARTITION BY id
    ORDER BY created_at
    MEASURES MATCH_NUMBER() AS mn,
             MATCH_SEQUENCE_NUMBER AS msn
    ALL ROWS PER MATCH
    PATTERN (c+i+|p+i+|ps+)
    DEFINE
        c AS status='created',
        i AS status='missing_info',
        p AS status='pending',
        s AS status='successful'
    ) mr
    QUALIFY (ROW_NUMBER() OVER (PARTITION BY mn, id ORDER BY msn) = 1)
          OR(ROW_NUMBER() OVER (PARTITION BY mn, id ORDER BY msn DESC) =1)
    ORDER BY id, created_at;
       
-- retrieve the result set above
WITH cte3 AS (
SELECT *
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
),

-- start time of each status
cte4 AS (
SELECT
    *,
    CASE WHEN status = 'successful' THEN IFNULL(completed_at, created_at) ELSE created_at END AS start_timestamp
FROM cte3 ),

-- end time of each status
cte5 AS (
SELECT
  *,
  CASE WHEN status = 'successful' THEN COMPLETED_AT 
  ELSE IFNULL(LAG(start_timestamp, -1) OVER (PARTITION BY id ORDER BY start_timestamp), completed_at) END AS end_timestamp
FROM cte4 )

-- final query
SELECT 
    id, status, start_timestamp, end_timestamp,
    tsrange_intersection(start_timestamp, end_timestamp) AS time_diff
    
FROM cte5
ORDER BY start_timestamp
kimi
  • 525
  • 5
  • 17
  • 1
    The only way to run a script on a schedule (without using any external tools) is to use a task. It’s not clear, to me, what your issue is with creating a stored procedure to do what you want (from your example code you seem to know at least the basics of what you are doing) - please can you explain what, precisely, is the issue you are facing? – NickW Aug 02 '21 at 19:39
  • I just read that a stored procedure returns a single row that contains a single column; it is not designed to return a result set. On the other hand, a task can execute one SQL statement. What I want to achieve is to return the result set of my query on a scheduled basis. Ultimately I have one SQL statement CREATE OR REPLACE TABLE at the top. So can I simply write one TASK to achieve this? Or do I still need to write an SP? In this case, how to wrap all the CTEs into a stored procedure? In the doc, most examples are just inserting simple records, I don't see any example to return result sets. – kimi Aug 02 '21 at 20:30
  • 1
    You can write the resultset to a table within the SP. The return from a SP, while it can be used for anything, is really meant to hold success/failure information. If the SP is being run to a schedule, what were you expecting to happen to the resultset if the SP returned it - given that a SP is a standalone object so there would be nothing to “receive” the resultset ? – NickW Aug 02 '21 at 20:54
  • The result set contains id, start_time, end_time, and time_diff between the two timestamps. I needed Javascript UDF to find the time difference within business hours. But the problem is that I cannot simply run the whole query outside of Snowflake, like a BI tool that can run SQL on a schedule since the query contains Javascript. Basically, all I want is to table the query's output and regularly refresh it. In this case, the SP is still the only way to go? – kimi Aug 02 '21 at 22:26
  • 1
    It is the best way to go. You just need to have the SP actually update your target table as part of the SP, rather than have the SP return a resultset (which it can't do). – Mike Walton Aug 02 '21 at 22:30
  • @MikeWalton Thanks for the comment! How can I create one SP that can execute the entire query and update the result set? I am not quite familiar with JavaScript and the examples in the doc are not covering this. Can you please help with some of part it? I added the my query above. Please let me know if something is not clear. – kimi Aug 06 '21 at 09:11
  • It's all documented here with examples: https://docs.snowflake.com/en/sql-reference/stored-procedures-usage.html. Assuming that what you want to do can't be achieved in a single SQL statement, then you execute a SELECT statement, which returns a resultset, and then you loop through the records in the resultset doing whatever you want to be done with the records. If you need to use the function in the SP then you can define it there if you want – NickW Aug 06 '21 at 17:34

0 Answers0