Questions tagged [airflow-scheduler]

The Apache Airflow scheduler monitors all tasks and all DAGs, and triggers the task instances whose dependencies have been met, and Apache Airflow is a platform to programmatically author, schedule and monitor workflows.

1257 questions
10
votes
2 answers

How to skip task in Airflow operator?

Is there a way for Airflow to skip current task from the PythonOperator? For example: def execute(): if condition: skip_current_task() task = PythonOperator(task_id='task', python_callable=execute, dag=some_dag) And also marking the…
zeyger
  • 1,320
  • 1
  • 12
  • 18
10
votes
3 answers

How to pass parameters to Airflow on_success_callback and on_failure_callback

I have implemented email alerts on success and failure using on_success_callback and on_failure_callback. According to Airflow documentation, a context dictionary is passed as a single parameter to this function. How can I pass another parameter…
Blessy
  • 480
  • 2
  • 7
  • 20
10
votes
2 answers

Airflow: set a default value in code when Variable doesn't exist without an exception

I have a little problem, I want to do the typical conditional like setting_x = Variable.get('setting_x') variable = setting_x if setting_x else 0 But since the Airflow model throws an exception when the key doesn't exist is impossible to do it…
skozz
  • 2,662
  • 3
  • 26
  • 37
10
votes
2 answers

Airflow - Broken DAG - Timeout

I have a DAG that executes a function that connects to a Postgres DB, deletes the contents in the table and then inserts a new data set. I am trying this in my local and I see when I try to run this, the web server takes a long time to connect and…
dark horse
  • 3,211
  • 8
  • 19
  • 35
10
votes
2 answers

How to migrate airflow variables between DEV and PROD environments?

We are using airflow to schedule our data pipelines, as part of it we also have added few connections and variables in airflow admin. Everything worked fine in DEV, now we want to setup PROD environment. How do we migrate these values into PROD…
10
votes
2 answers

Airflow backfill new task added to dag

Lets say today is 2017-10-20. I have an existing dag which is successful till today. I need to add a task with a start_date of 2017-10-01. How to make the scheduler trigger task from 2017-10-01 to 2017-10-20 automatically ?
ninjaturtle
  • 531
  • 3
  • 7
  • 16
9
votes
2 answers

How does the mode "reschedule" in Airflow Sensors work?

I have an Airflow Http sensor that calls a REST endpoint and checks for a specific value in the JSON structure returned by the API sensor = HttpSensor( soft_fail=True, task_id='http_sensor_check', http_conn_id='http_default', …
OCDev
  • 655
  • 1
  • 7
  • 21
9
votes
1 answer

airflow - how to 'Filling up the DagBag' once only

My dag takes about 50seconds to parse, I only use external triggers to start dag runs, no schedules. I notice airflow wants to fill the dagbag a lot --> On every trigger_dag command AND in the background it keeps checking the dags folder AND…
tooptoop4
  • 234
  • 3
  • 15
  • 45
9
votes
4 answers

Airflow : ExternalTaskSensor doesn't trigger the task

I have already seen this and this questions on SO and made the changes accordingly. However, my dependent DAG still gets stuck in poking state. Below is my master DAG: from airflow import DAG from airflow.operators.jdbc_operator import…
Darshan Mehta
  • 30,102
  • 11
  • 68
  • 102
9
votes
3 answers

Airflow External sensor gets stuck at poking

I want one dag starts after completion of another dag. one solution is using external sensor function, below you can find my solution. the problem I encounter is that the dependent dag is stuck at poking, I checked this answer and made sure that…
sia
  • 537
  • 1
  • 6
  • 22
9
votes
1 answer

Kubernetes executor do not parallelize sub DAGs execution in Airflow

We moved away from the Celery Executor in Airflow 1.10.0 because of some limitations of execution and right now we're using KubernetesExecutor. Right now we're not able to parallelize all the tasks in some DAGs even when we change the…
Flavio
  • 759
  • 1
  • 11
  • 24
9
votes
5 answers

Airflow dags and PYTHONPATH

I have some dags that can't seem to locate python modules. Inside of the Airflow UI, I see a ton of these message variations. Broken DAG: [/home/airflow/source/airflow/dags/test.py] No module named 'paramiko' Inside of a file I can directly modify…
sebastian
  • 2,008
  • 4
  • 31
  • 49
9
votes
4 answers

Airflow: Why is there a start_date for operators?

I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice? Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018…
soupybionics
  • 4,200
  • 6
  • 31
  • 43
9
votes
0 answers

airflow - all tasks being queued and not moved to execution

airflow 1.8.1 Scheduler, worker and webserver are running in separate dockers on AWS. The system was operational, and now for some reason all tasks are staying in queued state... No errors in scheduler logs. In worker I see this error (not sure if…
Gregory Danenberg
  • 519
  • 2
  • 9
  • 15
9
votes
0 answers

Airflow 'one_success' task not triggered

I'm running Airflow on a 4 CPU machine with LocalExecutor I've defined an upstream task to be one success create_spark_cluster_task = BashOperator( task_id='create_spark_cluster', trigger_rule='one_success', bash_command= ..., …
Tom Lous
  • 2,819
  • 2
  • 25
  • 46