I have an EC2 instance that is running airflow 1.8.0 using LocalExecutor
. Per the docs I would have expected that one of the following two commands would have raised the scheduler in daemon mode:
airflow scheduler --daemon --num_runs=20
or
airflow scheduler --daemon=True --num_runs=5
But that isn't the case. The first command seems like it's going to work, but it just returns the following output before returning to terminal without producing any background task:
[2017-09-28 18:15:02,794] {__init__.py:57} INFO - Using executor LocalExecutor
[2017-09-28 18:15:03,064] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
[2017-09-28 18:15:03,203] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
The second command produces the error:
airflow scheduler: error: argument -D/--daemon: ignored explicit argument 'True'
Which is odd, because according to the docs --daemon=True
should be a valid argument for the airflow scheduler
call.
Digging a little deeper took me to this StackOverflow post, where one of the responses recommends an implementation of systemd
for handling the airflow scheduler as a background process according to the code available as this repo.
My lightly-edited adaptations of the script are posted as the following Gists. I am using a vanilla m4.xlarge EC2 instance with Ubuntu 16.04.3:
- /etc/sysconfig/airflow
- /user/lib/systemd/system/airflow-scheduler.service
- /etc/tmpfiles.d/airflow.conf
From there I call:
sudo systemctl enable airflow-scheduler
sudo systemctl start airflow-scheduler
And nothing happens. While I have much more complex DAGs running on this instance, I am using this dummy case to create a simple test that also serves as a listener to let me know when the scheduler is operating as planned.
I've been using journalctl -f
to debug. Here are a few lines of output from the scheduler process. There's no obvious problem, but my tasks aren't executing and no logs are being produced for the test DAG that would help me zoom in on the error. Is the problem in here somewhere?
Sep 28 18:39:30 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:30,965] {dag_processing.py:627} INFO - Started a process (PID: 21822) to generate tasks for /home/ubuntu/airflow/dags/scheduler_test_dag.py - logging into /home/ubuntu/airflow/logs/scheduler/2017-09-28/scheduler_test_dag.py.log
Sep 28 18:39:31 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:31,016] {jobs.py:1002} INFO - No tasks to send to the executor
Sep 28 18:39:31 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:31,020] {jobs.py:1440} INFO - Heartbeating the executor
Sep 28 18:39:32 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:32,022] {jobs.py:1404} INFO - Heartbeating the process manager
Sep 28 18:39:32 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:32,023] {jobs.py:1440} INFO - Heartbeating the executor
Sep 28 18:39:33 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:33,024] {jobs.py:1404} INFO - Heartbeating the process manager
Sep 28 18:39:33 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:33,025] {dag_processing.py:559} INFO - Processor for /home/ubuntu/airflow/dags/capone_dash_dag.py finished
Sep 28 18:39:33 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:33,026] {dag_processing.py:559} INFO - Processor for /home/ubuntu/airflow/dags/scheduler_test_dag.py finished
When I run airflow scheduler
manually this all works fine. Since my test DAG has a start date of September 9 it just keep backfilling every minute since then, producing a running time ticker. When I use systemd
to run the scheduler as a deamon, however, it's totally quiet with no obvious source of the error.
Any thoughts?