-1

Okay, so this question is multi-parted due to the nature of the situation, so please please be patient and I will try to clarify as many details as I can through the question and comments.

If the question needs to be in another Stack-forum, please let me know, so that no one's time and or energy is wasted.

Thank you community in advance for your expertise.

Environment details: OS: Ubuntu 22.04.2 LTS Python version: 3.10.6

Airflow version:2.6.2, Running in Venv: yes

Python Scripts and DAGs structure: -Git-clone home |-ETL |-Project1 |-script.py (actual ETL) |-DAGs |-Project1 |-scipt1_dag.py |-etllib |-lib |-lib_file.py (contains SQLAlchemy engines, etc.) Python Scripts and DAGs running in Venv: yes Relevant section of pip freeze: oracledb==1.3.2 SQLAlchemy==2.0.18

Current State: I start activate airflow Venv and start airflow scheduler and airflow webserver in 2 terminals Result: Airflow instance is running DAG is visible and runs(connects to SQLAlchemy via dialect oracle+oracledb and extracts table data) without any errors.

What I would like to do is run airflow scheduler and airflow webserver as services to accomplish the following:

  1. Start the services when VM boots,
  2. Auto re-start services in case of a failure
  3. Not have 2 terminal windows running 2 services

Created airflow-webserver.service Unit file Scenario 1:

#!/bin/bash
[Unit]
Description=Apache Airflow Webserver Daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/home/airflow/airflow/airflow.env
WorkingDirectory=/home/airflow/airflow/
Type=simple
User=airflow
ExecStart=/bin/bash -c 'source /home/airflow/airflow/env/bin/activate && airflow webserver'
Restart=always
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Daemon reloaded. Started service. Status: webserver running.

Above Unit file structure and steps followed for scheduler well. Status: scheduler running

When running DAG, got the following error:

[2023-07-26, 22:07:25 UTC] {process_utils.py:185} INFO - Output:
[2023-07-26, 22:07:27 UTC] {process_utils.py:189} INFO - DPI-1047: Cannot locate a 64-bit Oracle Client library: "libnnz21.so: cannot open shared object file: No such file or directory". See https://python-oracledb.readthedocs.io/en//latest/user_guide/initialization.html#setting-the-oracle-client-library-directory for help

Section 3.1.3 of the link suggests that I do the following (Note: I am running Thick Mode due to DB security which is configured in SQLNET and TNSNAMES files):

import oracledb

oracledb.init_oracle_client()

When I did that, got the following error:

[2023-07-26, 22:10:45 UTC] {process_utils.py:185} INFO - Output:
[2023-07-26, 22:10:47 UTC] {process_utils.py:189} INFO - DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory". See https://python-oracledb.readthedocs.io/en//latest/user_guide/initialization.html#setting-the-oracle-client-library-directory for help

Figured it had something to do with the way bash was running (or not), and/or may be permissions that bash has (or doesn't). This leads to scenario 2.

Scenario 2 (updated airflow-webserver.service Unit file)based on this solution:

#!/bin/bash
[Unit]
Description=Apache Airflow Scheduler Daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/home/airflow/airflow/airflow.env
WorkingDirectory=/home/airflow/airflow/
Type=simple
User=airflow
ExecStartPre=source /home/airflow/airflow/env/bin/activate 
ExecStart=airflow scheduler
Restart=always
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Now the service itself is not starting:

airflow-webserver.service - Apache Airflow Webserver Daemon
     Loaded: loaded (/etc/systemd/system/airflow-webserver.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Thu 2023-07-27 11:03:59 PDT; 1s ago
    Process: 227431 ExecStartPre=source /home/airflow/airflow/env/bin/activate (code=exited, status=203/EXEC)
        CPU: 10ms

What I would like help with is to be able to run the Airflow Webserver, Scheduler as a service and get the outcome as if they were running in a terminal window, but without actively using a terminal window (potentially without bash?)

Disclaimer: I'm good with Python and Airflow, but not so much with Linux Scripting(enough to be able to mix-n-match if and where needed).

  • Re DPI-1047, if you have root access, then use `ldconfig` as shown in the Instant Client installation instructions. You will probably need to install Instant Client in a standard system location - not your home directory - else you will get DPI-1047 for no obvious reason. The `ldconfig` command will be run automatically with the Instant Client RPMs, e.g. run something like `alien -i --scripts oracle-instantclient19.19-basic-19.19.0.0.0-1.x86_64.rpm; apt-get install libaio1`. If you don't have root access you'll need to find a way to set LD_LIBRARY_PATH before Airflow starts. – Christopher Jones Jul 28 '23 at 00:46
  • I’m voting to close this question because as the systemd tag states: systemd questions should be for *programming questions* using systemd or its libraries. Questions about *configuring the daemon* (including writing unit files) are better directed to Unix & Linux: https://unix.stackexchange.com. Please delete this. – Rob Jul 28 '23 at 07:00
  • @Rob thank you for pointing me in the right direction. I'll post this question in the Unix and Linux Stack Exchange forum. – Samuel Gottipalli Jul 28 '23 at 16:25
  • @ChristopherJones, oracle instant client is installed and path set at `/opt/oracle_instantclient_21_10`. `ldconfig` is set as `export LD_LIBRARY_PATH=/opt/oracle_instantclient_21_10:$LD_LIBRARY_PATH` in the last line of `~/.bashrc` and `libaio`1 is installed as well. All done via `sudo`. – Samuel Gottipalli Jul 28 '23 at 16:34
  • You need to delete this one. It's off topic and doesn't belong here. – Rob Jul 28 '23 at 16:57
  • Oh, how I love rules engines, but neither Unix and Linux Stack Exchange, nor Ask Ubuntu Stack Exchange wouldn't let me post this question because it thinks that my question is "spam". Also I wonder why it is not possible to get help here, because I can see there's hundreds of other questions related to systemd in Stackoverflow. @Rob, I appreciate you trying to keep the forum clean, but not asking or having irrelavant questions in here, but I'm not being allowed to post in the right forum, so this is going to stay here for now. – Samuel Gottipalli Jul 28 '23 at 17:13
  • LD_LIBRARY_PATH is not ldconfig. And since you seem to running as a daemon, setting the env var in your .bashrc is going to be ignored. Use ldconfig as shown in the [install instructions](https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html#oracle-instant-client-zip-files) if you have installed ZIP files, or use the RPMs. – Christopher Jones Jul 28 '23 at 21:47
  • Thank you @ChristopherJones. Taking your comment, I wrote the answer below. Please feel free to edit, if you find any misrepresentations there. – Samuel Gottipalli Jul 28 '23 at 22:36

1 Answers1

0

Thanks to @ChristopherJones and his response I solved the problem!

This page -section 2.4.2 and subsequently the Oracle Instant Client for Linux -Installation instructions section, talks all about installing the Oracle Instant Client appropriately in multiple environments.

Section 2.4.2.1 bullet point 4 states the following:

If there is no other Oracle software on the machine that will be impacted, permanently add Instant Client to the runtime link path. For example, with sudo or as the root user:

sudo sh -c "echo /opt/oracle/instantclient_21_6 > /etc/ld.so.conf.d/oracle-instantclient.conf"
sudo ldconfig

Alternatively, set the environment variable LD_LIBRARY_PATH to the appropriate directory for the Instant Client version. For example:

export LD_LIBRARY_PATH=/opt/oracle/instantclient_21_6:$LD_LIBRARY_PATH

During my setup I followed the alternate instruction based on a google search response, but failed to follow the official instructions.

For an airflow instance, running in a Virtual environment, to be able to access oracledb while running as a daemon, it is critical to set ldconfig, as the environment variable LD_LIBRARY_PATH in .bashrc is ignored by daemons.