-1

We have the following scenario. A pipeline full build is triggered by a schedule, that is itself triggered by one of these two conditions: either a dozen datasets have updated data, or it's 3:15 in the morning. We have the time trigger to ensure the pipeline doesn't start building too late, hoping that by the time the upstream data is consumed it has been updated. The build takes between 2 and 5 hours.

Today, the schedule was triggered at around 2 am, and completed successfully at 6:45. So far so good.

But then, a second build was triggered by the same schedule, this time unexpected and unwanted.

Could it be, that the time based trigger also triggered a build, despite a build already being executed, and that it was "queued" after the ongoing one? Just trying to understand, so that we can tweak the triggering conditions.

Thanks in advance for your insights

Kevin Zhang
  • 263
  • 1
  • 9

2 Answers2

1

This question can be boiled down to the following: (a) will my schedule trigger and (b) what will build2 resolve the build to be when the schedule does trigger:

(a) You are correct. In your own words, the time based trigger will also trigger a build, despite a build already being executed, and that it was "queued" after the ongoing one.

Essentially, you can expect another instance of the schedule to run.

Do note you can design some custom logic in the xform to abort the build - for a way to tell the schedule to not retrigger if a build is already being executed.

(b) Might also be worth clarifying that unless Force build is enabled on the schedule, the build service will never re-build datasets that are up-to-date with all of its ancestors so if the schedule is simply A -> B -> C and A updates at 2:30, the scheduled run at 3:00 isn't going to re-build B and C if they're not stale

Kevin Zhang
  • 263
  • 1
  • 9
0

Note that triggers only determine whether a scheduled build should attempt to run. If the trigger does fire, then the target datasets’ freshness will determine whether the build will actually run or not. So, it’s possible for a trigger to fire but the run would ultimately be ignored since all target datasets are fresh.

It’s for this reason that you might see a triggered runs “queued” after an ongoing one, but whether that queued run actually does build depends entirely on if all the output datasets are up-to-date. It’s important to closely investigate the triggers on your input datasets, and the staleness of your output datasets on schedules like these in order to avoid common confusions.

Max Magid
  • 245
  • 1
  • 8