-2

Scenario:

I make a POST call that triggers a process to export a file, this call returns an Export-ID. This process can take an unknown amount of time to complete, so I have to make a GET call using the Export-ID periodically to see if the process is completed, this GET call returns a STATUS every time and a URL to the download location once the export has been completed.

Goal:

I want to create a Dag that kicks off this export, and then waits 20 minutes and sends the GET request to see if the URL is present, meaning the export has completed.

Issue:

Since we don't know how long it will take for the export to complete we don't want this Dag to be stuck in an idle state waiting 20 minutes to check if the export is complete, holding up resources. Is there a way to pause/stop this dag, releasing resources for this period.

What I thought of doing: Create Dag that has 2 tasks: Task 1: Kick off Export Task 2: unpause a second dag that checks and is schedule to run in the interval we want to check at Dag completed

Subdag 1 task: Task 1: get export-ID, check if url is available if so kick-off download then pause the dag once downloaded. If the url is not available, complete the dag and re-run again in the interval set.

Is this a good route to take? If so why not and what is better. I have not found anything API wise that lets me pause and unpause a dag using python, is this possible?

1 Answers1

1

Setup your process such that submit_export >> consume_export and if the export isn't yet ready have consume_export fail -- then put a retries and suitable retry interval for however long you want to keep trying.

Alternatively, if you want to know that the export was completed but the consumer step failed you could do submit_export >> wait_export >> consume_export, then if you see wait_export failed you'll know the export didn't appear in a timely manner.

joebeeson
  • 4,159
  • 1
  • 22
  • 29