0

I have successfully gotten apache airflow installed locally via pip .. with a few needed pins

pip3 install zipp==3.1.0
pip3 install sqlalchemy==1.3.24
python3 -m pip install virtualenv
pip3 install apache-airflow[cncf.kubernetes]

pip3 install apache-airflow

and since I am a n00b at all this stuff I am starting with the basics.. I first tried airflow standalone but no where in the docs did it say what the default username and password for that was... so instead I went in and just started a few services with a basic user...

airflow db init
airflow users create --role Admin --username admin --email admin --firstname admin --lastname admin --password admin

now this just needed to start.. and I realized I need to start the scheduler AND the webapp... for some reason my auto script doesn't do this so .. I have to do it manually but..

airflow scheduler &
airflow webapp

All is okay now.. I can see I have a starting GUI up and running.. things seems fine.. and I want to start this first DAG I found called

example_bash_operator

the issue is... when I click on the name.. or click on go.. half the time it works.. but more often than not the first few times I click anything.. I am greeted with an error

Python version: 3.8.10
Airflow version: 2.2.3
Node: juju-2dd159-310.lxd
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/www/auth.py", line 51, in decorated
    return func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/www/decorators.py", line 72, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/www/views.py", line 1732, in trigger
    if unpause and dag.is_paused:
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/models/dag.py", line 1081, in is_paused
    warnings.warn(
  File "/usr/lib/python3.8/warnings.py", line 109, in _showwarnmsg
    sw(msg.message, msg.category, msg.filename, msg.lineno,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/settings.py", line 117, in custom_show_warning
    write_console.print(msg, soft_wrap=True)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 1642, in print
    self._buffer.extend(new_segments)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 842, in __exit__
    self._exit_buffer()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 800, in _exit_buffer
    self._check_buffer()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 1935, in _check_buffer
    self.file.flush()
BrokenPipeError: [Errno 32] Broken pipe

If I ignore this and maybe wait a minute or just go try again.. suddenly it works... any clue how to smooth this experience out?

EDIT: incase this helps answer the question

ubuntu@juju-2dd159-311:~$ pip --version
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
ubuntu@juju-2dd159-311:~$ python3 --version
Python 3.8.10

EDIT #2

I followed these instructions to get the constraints installed as they want https://airflow.apache.org/docs/apache-airflow/stable/start/local.html

this fixed the stability greatly in the GUI and UI.. .however... I then started to hook up the postgresql database... and now it won't even login without a brokenpipe error

Python version: 3.8.10
Airflow version: 2.2.3
Node: juju-2dd159-318.lxd
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/www/auth.py", line 51, in decorated
    return func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/www/views.py", line 718, in index
    paging=wwwutils.generate_pages(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/www/utils.py", line 113, in generate_pages
    previous_node = Markup(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/jinja2/utils.py", line 838, in __new__
    warnings.warn(
  File "/usr/lib/python3.8/warnings.py", line 109, in _showwarnmsg
    sw(msg.message, msg.category, msg.filename, msg.lineno,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/airflow/settings.py", line 117, in custom_show_warning
    write_console.print(msg, soft_wrap=True)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 1642, in print
    self._buffer.extend(new_segments)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 842, in __exit__
    self._exit_buffer()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 800, in _exit_buffer
    self._check_buffer()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/rich/console.py", line 1935, in _check_buffer
    self.file.flush()
Erik
  • 2,782
  • 3
  • 34
  • 64

2 Answers2

1

Have you tried to follow the "quick start" instructions ?

https://airflow.apache.org/docs/apache-airflow/stable/start/index.html

Airflow has nice and comprehensive instructions on how to start and if you follow it step-by-step, you will get Airlfow up and running. This can be done either via docker compose or local virtualenv.

What might be your problem is lack of resources - memory (most likely). Airflow requires quite a lot of memory (4GB) to start as it is a complex system. It is written as prerequisite especially in the Docker Compose quick start. And the Docker Compose will even warn you if you have not enough resources, so I recommend this one if you want really solid and robust quick-start.

You need to take a look at your logs to see why you have broken pipe errors. But lack of resources is most likely reason.

Regarding "standalone" mode and user password - you likely missed what airflow wrote you. It generates password dynamically when starting and actually tells you what password you should use:

standalone | 
standalone | Airflow is ready
standalone | Login with username: admin  password: 4hfH8mATcvMFmne9
standalone | Airflow Standalone is for development purposes only. Do not use this in production!
standalone | 
Jarek Potiuk
  • 19,317
  • 2
  • 60
  • 61
  • I have 64GB of RAM , so it's not that.. what happened was I was starting via a script.. and for some reason that causes MASSIVE problems for apache airflow.. I have since switched to systemd .. if I see it's stable after a week.. I will share my systemd service files .. thank you for the headsup regarding the password being generated.. this is being done in a script so must have been easy to miss – Erik Jan 02 '22 at 15:09
  • also I am not using docker.. but LXD for containerization so.. so I was doing quickstart by hand just FYI – Erik Jan 02 '22 at 15:10
  • 1
    Don't use standalone in systemd. > Airflow Standalone is for development purposes only. Do not use this in production! I know our users are creative, but standalone is REALLY for "interactive" use not for systemd scripts. It's ONLY for development. Don't ever use it in automated scripts - it will bite you back if you do. – Jarek Potiuk Jan 02 '22 at 19:28
  • thanks , indeed I never actually used standalone and had jumped straight to getting Production to work.. with my systemd scripts it's now super super stable and I've now moved onto getting Connections and Encryption figured out.. great documentation so far on that btw – Erik Jan 04 '22 at 10:13
  • I've now posted my systemD scripts, feedback very welcome and thanks again Jarek – Erik Jan 04 '22 at 10:20
0

so it turns out.. writing a simple script to run apache airflow in the background

#!/bin/bash
airflow webserver -D

doesn't work or play well.. with Ubuntu LTS 20.04.. I've discovered now that I should instead let SystemD handle the starting stopping.. and this is now working really well.. here's how I inject the needed script in Ubuntu 20.04 LTS

This registers the webserver, scheduler, and triggerer services

if [ ! -e /etc/systemd/system/airflow-scheduler.service ]; then
  cat <<EOT >> /etc/systemd/system/airflow-scheduler.service
[Unit]
Description=Airflow scheduler daemon

[Service]
Environment="PATH=/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=ubuntu
Type=simple
ExecStart=/home/ubuntu/.local/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
EOT
fi

if [ ! -e /etc/systemd/system/airflow-webserver.service ]; then
  cat <<EOT >> /etc/systemd/system/airflow-webserver.service
[Unit]
Description=Airflow webserver daemon

[Service]
Environment="PATH=/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=ubuntu
Type=simple
ExecStart=/home/ubuntu/.local/bin/airflow webserver
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target
EOT
fi

if [ ! -e /etc/systemd/system/airflow-triggerer.service ]; then
  cat <<EOT >> /etc/systemd/system/airflow-triggerer.service
[Unit]
Description=Airflow triggerer daemon

[Service]
Environment="PATH=/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
User=ubuntu
Type=simple
ExecStart=/home/ubuntu/.local/bin/airflow triggerer
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target
EOT
fi

systemctl daemon-reload
systemctl enable airflow-scheduler
systemctl enable airflow-webserver
systemctl enable airflow-triggerer

then to start I simply do

systemctl start airflow-webserver
systemctl start airflow-scheduler
systemctl start airflow-triggerer

Erik
  • 2,782
  • 3
  • 34
  • 64