12

I am new to Airflow. I am trying to run airflow scheduler as a daemon process, but the process does not live for long. I have configured "LocalExecutor" in airflow.cfg file and ran the following command to start the scheduler.(I am using Google compute engine and accessing server via PuTTY)

airflow scheduler --daemon --num_runs=5 --log-file=/root/airflow/logs/scheduler.log

When I run this command, the airflow scheduler starts and I can see the airflow-scheduler.pid file in my airflow home folder, but the process does not live for long. When I close the PuTTY session and reconnect to the server, I cannot find the scheduler process. Am I missing something? How can I run the airflow scheduler as a daemon process?

Newskooler
  • 3,973
  • 7
  • 46
  • 84
MJK
  • 1,381
  • 3
  • 15
  • 22

3 Answers3

11

I had a similar problem. My airflow scheduler did not keep running as a deamon process when I executed scheduler as deamon:

airflow scheduler -D

But the scheduler did work when I ran it normally. After I deleted the airflow-scheduler.err file and rerun the scheduler as a deamon process it started working:

rm airflow-scheduler.err
airflow scheduler -D
Floris
  • 111
  • 1
  • 3
5

You can use systemd or upstart as described here:

https://github.com/apache/incubator-airflow/tree/master/scripts/systemd https://github.com/apache/incubator-airflow/tree/master/scripts/upstart

Here are the instructions just in case if links break in the future.

The provided systemd files are tested on RedHat based systems. Copy (or link) them to /usr/lib/systemd/system and copy the airflow.conf to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/. Copying airflow.conf ensures /run/airflow is created with the right owner and permissions (0755 airflow airflow)

You can then start the different servers by using systemctl start . Enabling services can be done by issuing

systemctl enable [service]

By default the environment configuration points to /etc/sysconfig/airflow . You can copy the "airflow" file in this directory and adjust it to your liking. Make sure to specify the SCHEDULER_RUNS variable.

With some minor changes they probably work on other systemd systems.

You can modify provided below configuration files to reflect your environment

Content of /etc/sysconfig/airflow file

# This file is the environment file for Airflow. Put this file in /etc/sysconfig/airflow per default
# configuration of the systemd unit files.
#
# AIRFLOW_CONFIG=
# AIRFLOW_HOME=
#
# required setting, 0 sets it to unlimited. Scheduler will get restart after every X runs
SCHEDULER_RUNS=5

Content of /etc/tmpfiles.d/airflow.conf or /usr/lib/tmpfiles.d/airflow.conf file

D /run/airflow 0755 airflow airflow

Content of /usr/lib/systemd/system/airflow-scheduler.service

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler -n ${SCHEDULER_RUNS}
KillMode=process
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
Dmitri Safine
  • 833
  • 9
  • 9
  • 1
    As the links may break in the future, it would be very useful for you to include the information that is relevant to answer to this question in your answer. – cavpollo Nov 11 '16 at 21:27
  • 1
    Thanks for the feedback, cavpollo. I have added all the relevant information. – Dmitri Safine Nov 11 '16 at 21:53
4

--num-runs=5 will make scheduler run task instances 5 times. You can remove that arguments to make scheduler long running.

Ideally you should run that scheduler under supervisor, so when the process crashed / stopped, it will rerun.

dieend
  • 2,231
  • 1
  • 24
  • 29
  • `Ideally you should run that scheduler under supervisor, so when..`; I guess `Restart=always` in the `[Service]` *description* does exactly that – y2k-shubham Jan 09 '19 at 14:57