Questions tagged [airflow]

Apache Airflow is a workflow management platform to programmatically author, schedule, and monitor workflows as directed acyclic graphs (DAGs) of tasks.

Airflow is a workflow scheduler. It was developed by Airbnb to manage its complicated workflows.

References

Related Tags###

Similar workflow schedulers:

10104 questions
59
votes
3 answers

How to restart a failed task on Airflow

I am using a LocalExecutor and my dag has 3 tasks where task(C) is dependant on task(A). Task(B) and task(A) can run in parallel something like below A-->C B So task(A) has failed and but task(B) ran fine. Task(C) is yet to run as task(A) has…
Chetan J
  • 1,847
  • 5
  • 16
  • 21
58
votes
3 answers

Airflow versus AWS Step Functions for workflow

I am working on a project that grabs a set of input data from AWS S3, pre-processes and divvies it up, spins up 10K batch containers to process the divvied data in parallel on AWS Batch, post-aggregates the data, and pushes it to S3. I already have…
sanjayr
  • 1,679
  • 2
  • 20
  • 41
58
votes
5 answers

Refreshing dags without web server restart apache airflow

Is there any way to reload the jobs without having to restart the server?
ryudice
  • 36,476
  • 32
  • 115
  • 163
57
votes
7 answers

First time login to Apache Airflow asks for username and password, what is the username and password?

I've just installed Apache Airflow, and I'm launching the webserver for the first time, and it asks me for username and password, I haven't set any username or password. Can you let me know what is the default username and password for airflow?
55
votes
3 answers

For Apache Airflow, How can I pass the parameters when manually trigger DAG via CLI?

I use Airflow to manage ETL tasks execution and schedule. A DAG has been created and it works fine. But is it possible to pass parameters when manually trigger the dag via cli. For example: My DAG runs every day at 01:30, and processes data for…
Frank Liu
  • 553
  • 1
  • 5
  • 6
55
votes
3 answers

Efficient way to deploy dag files on airflow

Are there any best practices that are followed for deploying new dags to airflow? I saw a couple of comments on the google forum stating that the dags are saved inside a GIT repository and the same is synced periodically to the local location in…
Sreenath Kamath
  • 663
  • 1
  • 7
  • 17
55
votes
9 answers

Removing Airflow task logs

I'm running 5 DAG's which have generated a total of about 6GB of log data in the base_log_folder over a months period. I just added a remote_base_log_folder but it seems it does not exclude logging to the base_log_folder. Is there anyway to…
jompa
  • 551
  • 1
  • 4
  • 3
55
votes
3 answers

Airflow parallelism

the Local Executor spawns new processes while scheduling tasks. Is there a limit to the number of processes it creates. I needed to change it. I need to know what is the difference between scheduler's "max_threads" and "parallelism" in airflow.cfg…
sidd607
  • 1,339
  • 2
  • 18
  • 29
54
votes
15 answers

Airflow scheduler does not appear to be running after execute a task

When there is a task running, Airflow will pop a notice saying the scheduler does not appear to be running and it kept showing until the task finished: The scheduler does not appear to be running. Last heartbeat was received 5 minutes ago. The DAGs…
DennisLi
  • 3,915
  • 6
  • 30
  • 66
52
votes
5 answers

Airflow "This DAG isnt available in the webserver DagBag object "

when I put a new DAG python script in the dags folder, I can view a new entry of DAG in the DAG UI but it was not enabled automatically. On top of that, it seems does not loaded properly as well. I can only click on the Refresh button few times on…
santi
  • 651
  • 1
  • 6
  • 6
51
votes
4 answers

How do I force a task on airflow to fail?

I have a python callable process_csv_entries that processes csv file entries. I want my task to complete successfully only if all entries were processed successfully. Task should fail otherwise def process_csv_entries(csv_file): # Boolean …
Mask
  • 705
  • 1
  • 5
  • 10
50
votes
2 answers

Airflow pass parameters to dependent task

What is the way to pass parameter into dependent tasks in Airflow? I have a lot of bashes files, and i'm trying to migrate this approach to airflow, but i don't know how to pass some properties between tasks. This is a real example: #sqoop bash…
Carleto
  • 951
  • 1
  • 9
  • 17
50
votes
6 answers

How to run Airflow on Windows

The usual instructions for running Airflow do not apply on a Windows environment: # airflow needs a home, ~/airflow is the default, # but you can lay foundation somewhere else if you prefer # (optional) export AIRFLOW_HOME=~/airflow # install from…
Rafael
  • 1,018
  • 1
  • 10
  • 18
48
votes
7 answers

Airbnb Airflow using all system resources

We've set up Airbnb/Apache Airflow for our ETL using LocalExecutor, and as we've started building more complex DAGs, we've noticed that Airflow has starting using up incredible amounts of system resources. This is surprising to us because we mostly…
jdotjdot
  • 16,134
  • 13
  • 66
  • 118
47
votes
3 answers

How to set dependencies between DAGs in Airflow?

I am using Airflow to schedule batch jobs. I have one DAG (A) that runs every night and another DAG (B) that runs once per month. B depends on A having completed successfully. However B takes a long time to run and so I would like to keep it in a…
Conor
  • 1,509
  • 2
  • 20
  • 28