6

I have configured my Airflow setup to run with systemd according to this. It was great for a couple of days but it has thrown some errors that I can't figure out how to fix. Running sudo systemctl start airflow-webserver.service doesn't really do anything but running airflow webserver works (however, using systemd is needed for our purposes).

To understand what's the error, I run sudo systemctl status airflow-webserver.service, and it gives the following status and error:

Feb 20 18:54:43 ip-172-31-25-17.ec2.internal airflow[19660]: [2019-02-20 18:54:43,774] {models.py:258} INFO - Filling up the DagBag from /home/ec2-user/airflow/dags
Feb 20 18:54:43 ip-172-31-25-17.ec2.internal airflow[19660]: /home/ec2-user/airflow/dags/statcan_1410009501.py:33: SyntaxWarning: name 'pg_hook' is assigned to before global declaration
Feb 20 18:54:43 ip-172-31-25-17.ec2.internal airflow[19660]: global pg_hook
Feb 20 18:54:43 ip-172-31-25-17.ec2.internal airflow[19660]: /usr/lib/python2.7/site-packages/airflow/utils/helpers.py:346: DeprecationWarning: Importing 'PythonOperator' directly from 'airflow.operators' has been deprecated. Please import from 'airflow.operators.[operat...irely in Airflow 2.0.
Feb 20 18:54:43 ip-172-31-25-17.ec2.internal airflow[19660]: DeprecationWarning)
Feb 20 18:54:43 ip-172-31-25-17.ec2.internal airflow[19660]: /usr/lib/python2.7/site-packages/airflow/utils/helpers.py:346: DeprecationWarning: Importing 'BashOperator' directly from 'airflow.operators' has been deprecated. Please import from 'airflow.operators.[operator...irely in Airflow 2.0.
Feb 20 18:54:43 ip-172-31-25-17.ec2.internal airflow[19660]: DeprecationWarning)
Feb 20 18:54:44 ip-172-31-25-17.ec2.internal airflow[19660]: [2019-02-20 18:54:44,528] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
Feb 20 18:54:45 ip-172-31-25-17.ec2.internal airflow[19660]: [2019-02-20 18:54:45 +0000] [19733] [INFO] Starting gunicorn 19.9.0
Feb 20 18:54:45 ip-172-31-25-17.ec2.internal airflow[19660]: Error: /run/airflow doesn't exist. Can't create pidfile.

The scheduler seems to be working fine, as verified after running both systemctl status airflow-scheduler.service and journalctl -f.

Here's the setup of the following systemd files:

/usr/lib/systemd/system/airflow-webserver.service

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=ec2-user
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

/etc/tmpfiles.d/airflow.conf

D /run/airflow 0755 airflow airflow

/etc/sysconfig/airflow

AIRFLOW_CONFIG= $AIRFLOW_HOME/airflow.cfg
AIRFLOW_HOME= /home/ec2-user/airflow

Prior to this error, I moved my airflow installation from root to home directory. Not sure if it would have affected my setup but putting it here in case it is relevant.

Can anyone provide any explanation for the error and how to fix it? I tried my best to configure systemd as closely as possible to what is instructed but maybe I'm missing something?

Edit 2:

Sorry, I pasted the wrong code. So this is my code for airflow-webserver.service

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=ec2-user
Type=simple
ExecStart=/bin/airflow webserver --pid /run/airflow/webserver.pid
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

3 Answers3

10

I encountered this issue too and was able to resolve the issue by providing runtime directory parameters under [Service] in the airflow-webserver.service unit file:

[Service]
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775

I was not able to figure out how to get it to work with /etc/tmpfiles.d/airflow.conf alone.

dstandish
  • 2,328
  • 18
  • 34
  • @chorbs, could you please provide more detail on why this worked for you? – Newskooler Mar 18 '19 at 20:16
  • 1
    @Newskooler this is my first time using systemd so I am by know means expert but I assume that this tells systemd to create the dir /run/airflow with permissions 0775, and that when you don't specify this, the directory won't be created, or won't be created with sufficiently permissive permissions – dstandish Mar 18 '19 at 23:36
  • That's exactly what it does. It makes the `/etc/tmpfiles.d/airflow.conf` redudnent. I don't know why it was not added in the airflow repo like this... – Newskooler Mar 19 '19 at 13:29
  • i added the runtimeDirectoryMode to 775,but still getting the same Error – naveen kumar Aug 17 '19 at 07:25
  • did you also add `RuntimeDirectory`? – dstandish Aug 17 '19 at 15:56
1

The config file /etc/tmpfiles.d/airflow.conf is used by systemd-tmpfiles-setup service at boot. So, a server restart should create the /run/airflow directory. It's not possible to just restart this service as per https://github.com/systemd/systemd/issues/8684.

As suggested at the above link, after copying airflow.conf to /etc/tmpfiles.d/, just run sudo systemd-tmpfiles --create and /run/airflow should get created.

rubpa
  • 183
  • 1
  • 7
0

It looks like you are running the scheduler and not the webserver:

ExecStart=/bin/airflow scheduler

You might want to do something like:

ExecStart=/bin/airflow webserver -p 8080 --pid /run/airflow/webserver.pid

Maybe you just copy pasted the wrong file, do share the correct one in that case (airflow-webserver.service) so we can help you troubleshoot this.