1

I have 4 stream. A_STREAM, B_STREAM, C_STREAM, D_ STREAM

I have chain of task where A_TASK is parent and it has 3 child task (B_TASK, C_TASK, D_TASK).

CREATE TASK A_TASK
WAREHOUSE = XYZ
SECHDULE = '15 MINUTE'
WHEN SYSTEM$STREAM_HAS_DATA('A_STREAM)
AS 
DO Something;
CREATE TASK C_TASK
WAREHOUSE=XYZ
AFTER A_TASK
WHEN SYSTEM$STREAM_HAS_DATA('C_STREAM')
AS 
DO SOMETHING;

Let say A_TASK got triggered and completed but when it came to execution for C_TASK stream C_STREAM didn't had data so task didn't got triggered. After 5 minutes C_STREAM got data.

Here the issue is data will never got loaded to Target table from C_STREAM since next time A_TASK won't get triggered. How do we tackle these kind of secnario?

I can't seperate these task since they operate on same target table.

In Snowflake task do we have something like child task will wait until dependency is met?

rjain550
  • 105
  • 2
  • 8
  • It doesn't sound like A and C are really dependent on each other in your example. Why not just load them each separately? – Mike Walton Aug 16 '20 at 00:17
  • A and C are dependent on each other since they get data from different source that's why I have created 4 different stream but they need to be processed in sequentially and loaded to same table. Before loading to the target table it does some validation on basis of earlier loaded file. In current time I am using job sechduler tool like autosys but in Snowflake I want to make use of task only. – rjain550 Aug 17 '20 at 01:43
  • If they are dependent on each other, then it seems odd that you'd get data from one and not the other. That's sort of what I was concerned with. Maybe it would help to better understand how your current scheduler handles this case, and then we can work on a TASKS-related solution for you. – Mike Walton Aug 17 '20 at 14:16
  • In most of the cases if will get the data but I have to think of worst scenario as well. In current case task a is master upon succes of A task only B get triggered and upon success of B task C. But in Snowflake task I tried replicating it but it just skip if stream has no data and never get triggered until parent task stream has data. But in ideal case the child task should wait until stream has data because its one of the start condition – rjain550 Aug 17 '20 at 19:10
  • Well, you don't need to look at the stream for each task. You could replicate the same behavior that your current process has by simply running table B and C based on the success of the predecessor. You do not have to have a WHEN clause. – Mike Walton Aug 17 '20 at 21:07
  • But if stream has no data what's the use of running task – rjain550 Aug 18 '20 at 10:02
  • Right...I'm just saying that this would accomplish the same thing as your current process. I'm not sure I fully understand what the issue is with how you have things setup currently. If there is no data, it doesn't run. If there is data, it runs. If A runs and C doesn't, then next time A runs, it will pick up any new data from A, and then run C again. Your use-case description is hard to follow. I don't understand how you have dependencies, but might not have data for part of it. – Mike Walton Aug 18 '20 at 18:12

0 Answers0