Okay, so this question is multi-parted due to the nature of the situation, so please please be patient and I will try to clarify as many details as I can through the question and comments.
If the question needs to be in another Stack-forum, please let me know, so that no one's time and or energy is wasted.
Thank you community in advance for your expertise.
Environment details: OS: Ubuntu 22.04.2 LTS Python version: 3.10.6
Airflow version:2.6.2, Running in Venv: yes
Python Scripts and DAGs structure: -Git-clone home |-ETL |-Project1 |-script.py (actual ETL) |-DAGs |-Project1 |-scipt1_dag.py |-etllib |-lib |-lib_file.py (contains SQLAlchemy engines, etc.) Python Scripts and DAGs running in Venv: yes Relevant section of pip freeze: oracledb==1.3.2 SQLAlchemy==2.0.18
Current State:
I start activate airflow Venv and start airflow scheduler
and airflow webserver
in 2 terminals
Result: Airflow instance is running
DAG is visible and runs(connects to SQLAlchemy via dialect oracle+oracledb
and extracts table data) without any errors.
What I would like to do is run airflow scheduler
and airflow webserver
as services to accomplish the following:
- Start the services when VM boots,
- Auto re-start services in case of a failure
- Not have 2 terminal windows running 2 services
Created airflow-webserver.service
Unit file
Scenario 1:
#!/bin/bash
[Unit]
Description=Apache Airflow Webserver Daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/home/airflow/airflow/airflow.env
WorkingDirectory=/home/airflow/airflow/
Type=simple
User=airflow
ExecStart=/bin/bash -c 'source /home/airflow/airflow/env/bin/activate && airflow webserver'
Restart=always
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Daemon reloaded. Started service. Status: webserver running.
Above Unit file structure and steps followed for scheduler well. Status: scheduler running
When running DAG, got the following error:
[2023-07-26, 22:07:25 UTC] {process_utils.py:185} INFO - Output:
[2023-07-26, 22:07:27 UTC] {process_utils.py:189} INFO - DPI-1047: Cannot locate a 64-bit Oracle Client library: "libnnz21.so: cannot open shared object file: No such file or directory". See https://python-oracledb.readthedocs.io/en//latest/user_guide/initialization.html#setting-the-oracle-client-library-directory for help
Section 3.1.3 of the link suggests that I do the following (Note: I am running Thick Mode due to DB security which is configured in SQLNET and TNSNAMES files):
import oracledb
oracledb.init_oracle_client()
When I did that, got the following error:
[2023-07-26, 22:10:45 UTC] {process_utils.py:185} INFO - Output:
[2023-07-26, 22:10:47 UTC] {process_utils.py:189} INFO - DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory". See https://python-oracledb.readthedocs.io/en//latest/user_guide/initialization.html#setting-the-oracle-client-library-directory for help
Figured it had something to do with the way bash
was running (or not), and/or may be permissions that bash
has (or doesn't). This leads to scenario 2.
Scenario 2 (updated airflow-webserver.service
Unit file)based on this solution:
#!/bin/bash
[Unit]
Description=Apache Airflow Scheduler Daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/home/airflow/airflow/airflow.env
WorkingDirectory=/home/airflow/airflow/
Type=simple
User=airflow
ExecStartPre=source /home/airflow/airflow/env/bin/activate
ExecStart=airflow scheduler
Restart=always
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Now the service itself is not starting:
airflow-webserver.service - Apache Airflow Webserver Daemon
Loaded: loaded (/etc/systemd/system/airflow-webserver.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Thu 2023-07-27 11:03:59 PDT; 1s ago
Process: 227431 ExecStartPre=source /home/airflow/airflow/env/bin/activate (code=exited, status=203/EXEC)
CPU: 10ms
What I would like help with is to be able to run the Airflow Webserver, Scheduler as a service and get the outcome as if they were running in a terminal
window, but without actively using a terminal
window (potentially without bash
?)
Disclaimer: I'm good with Python and Airflow, but not so much with Linux Scripting(enough to be able to mix-n-match if and where needed).