-2

I would like to ask if any of you have experience installing Airflow on Compute Engine. I have been searching for instructions on Google and asking ChatGPT, but I have not been successful so far. I would like to know the correct sequence of commands for installing Airflow on Compute Engine. I believe my installation issues may be due to my lack of understanding of the proper installation steps for Airflow. Thank you for your response.

  • Airflow installation is documented. If you have a problem, then post a question: https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html – John Hanley Jun 13 '23 at 02:08
  • My issue is that I don't understand the process of installing Airflow, and in the provided link, I couldn't find the installation steps specifically for Compute Engine. I apologize for being a beginner. Previously, I worked as a data analyst and I am currently learning to become a data engineer – linggapratama28 Jun 13 '23 at 02:26
  • Did you have time to check my [answer?](https://stackoverflow.com/help/someone-answers) It helped you to solve your issue? If not, I am happy to assist further. – Srividya Jun 19 '23 at 06:46

1 Answers1

0

Airflow allows users to create workflows as Directed Acyclic Graphs (DAGs) of tasks tied together to create workflows. Airflow can connect with multiple data sources and send alerts via email/notification about the Job’s status.

Integrating Airflow on Google Compute Engine is easily done in 5 steps:

Step-1 : create a Compute Engine Instance to set up Google Airflow Integration

  • Log in to Cloud Console, and on the search box, Search for “Create an Instance“.
  • Click on New VM Instance, provide the Instance’s name, and select the instances as per your requirement. For example : Machine type can be e2-standard-2 (2vCPU, 8 GB Memory). Image : Debian 1.0 and 50 GB HDD
  • Click Create to create Compute Engine VM Instance.

Check out the link to create a Compute Engine VM instance using the gcloud command line.

Step-2 : Install Apache Airflow

  • Once the Instance is created, click the SSH to start a terminal.

  • Once the terminal is up and running, upgrade the machine and install Python3 by running the following commands.

    sudo apt update

    sudo apt -y upgrade

    sudo apt-get install wget

    sudo apt install -y python3-pip

You can use either conda or miniconda to create a virtual environment for Google Airflow Integration. Refer to the doc written by Vishal Agarwal to run the commands for creating a virtual environment and installing Airflow.

Step-3 : Setting Up Airflow

After successful installation of Airflow you need to set-up and initialize the Airflow. Run the following command for successful creation of admin user of Airflow.

airflow db init airflow users create -r Admin -u <username> -p <password> -e <email> -f <first name> -l <last name>

Step- 4: Open Firewall

Airflow runs only on port 8080. So, In the GCP console, Navigate to the VPC Network -> Click on Firewall and create a port rule. Add port 8080 under TCP and click Create Rule in the Port rule.

On the Compute Instance, add the Firewall rule to access port 8080.

Step-5: Start Airflow

Once the Firewall is set up correctly. Start the Airflow Webserver by the following command:

airflow webserver -p 8080

Open another terminal and start the Airflow Scheduler:

export AIRFLOW_HOME=/home/user/airflow_demo cd airflow_demo conda activate airflow_demo airflow db init airflow scheduler

Once the scheduler is started. Open Airflow console from browser. Go to https://<vm-IP-address>:8080.

Give the username and password we created in Step 3. Now you can create DAGs in Airflow UI.

Srividya
  • 1,678
  • 3
  • 10