I would like to ask if any of you have experience installing Airflow on Compute Engine. I have been searching for instructions on Google and asking ChatGPT, but I have not been successful so far. I would like to know the correct sequence of commands for installing Airflow on Compute Engine. I believe my installation issues may be due to my lack of understanding of the proper installation steps for Airflow. Thank you for your response.
-
Airflow installation is documented. If you have a problem, then post a question: https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html – John Hanley Jun 13 '23 at 02:08
-
My issue is that I don't understand the process of installing Airflow, and in the provided link, I couldn't find the installation steps specifically for Compute Engine. I apologize for being a beginner. Previously, I worked as a data analyst and I am currently learning to become a data engineer – linggapratama28 Jun 13 '23 at 02:26
-
Did you have time to check my [answer?](https://stackoverflow.com/help/someone-answers) It helped you to solve your issue? If not, I am happy to assist further. – Srividya Jun 19 '23 at 06:46
1 Answers
Airflow allows users to create workflows as Directed Acyclic Graphs (DAGs) of tasks tied together to create workflows. Airflow can connect with multiple data sources and send alerts via email/notification about the Job’s status.
Integrating Airflow on Google Compute Engine is easily done in 5 steps:
Step-1 : create a Compute Engine Instance to set up Google Airflow Integration
- Log in to Cloud Console, and on the search box, Search for “Create an Instance“.
- Click on New VM Instance, provide the Instance’s name, and select the instances as per your requirement. For example : Machine type can be e2-standard-2 (2vCPU, 8 GB Memory). Image : Debian 1.0 and 50 GB HDD
- Click Create to create Compute Engine VM Instance.
Check out the link to create a Compute Engine VM instance using the gcloud command line.
Step-2 : Install Apache Airflow
Once the Instance is created, click the SSH to start a terminal.
Once the terminal is up and running, upgrade the machine and install Python3 by running the following commands.
sudo apt update
sudo apt -y upgrade
sudo apt-get install wget
sudo apt install -y python3-pip
You can use either conda or miniconda to create a virtual environment for Google Airflow Integration. Refer to the doc written by Vishal Agarwal to run the commands for creating a virtual environment and installing Airflow.
Step-3 : Setting Up Airflow
After successful installation of Airflow you need to set-up and initialize the Airflow. Run the following command for successful creation of admin user of Airflow.
airflow db init airflow users create -r Admin -u <username> -p <password> -e <email> -f <first name> -l <last name>
Step- 4: Open Firewall
Airflow runs only on port 8080. So, In the GCP console, Navigate to the VPC Network -> Click on Firewall and create a port rule. Add port 8080 under TCP and click Create Rule in the Port rule.
On the Compute Instance, add the Firewall rule to access port 8080.
Step-5: Start Airflow
Once the Firewall is set up correctly. Start the Airflow Webserver by the following command:
airflow webserver -p 8080
Open another terminal and start the Airflow Scheduler:
export AIRFLOW_HOME=/home/user/airflow_demo cd airflow_demo conda activate airflow_demo airflow db init airflow scheduler
Once the scheduler is started. Open Airflow console from browser. Go to https://<vm-IP-address>:8080.
Give the username and password we created in Step 3. Now you can create DAGs in Airflow UI.

- 1,678
- 3
- 10