2

I am trying to create an Airflow DAG from which I want to spin a Compute Engine instance with a docker image stored in Google Container Registry.

In other words, I wanted to replicate gcloud compute instances create-with-container with airflow dags using gcloud operators. I searched for airflow operators for such operations but couldn't find any way to make them work.

Possible references:

  1. https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/compute.html

  2. https://cloud.google.com/composer/docs/connect-gce-vm-sshoperator

Marco Cerliani
  • 21,233
  • 3
  • 49
  • 54
Awesh
  • 51
  • 8

1 Answers1

0

A simple and clean solution to run a premade container using VMs with Airflow may consist in chaining the 3 steps below:

  1. create a fresh new VM (through a BashOperator) with a startup script that pulls/runs the container and shut down the VM when the running is done;
  2. use a PythonSensor to check when the VM is stopped (i.e. the docker finish running);
  3. delete the VM (through a BashOperator) in order to repeat the previous steps when the airflow dag is triggered the next time.

All we need are the below bash commands:

bash_cmd = {
    'active_account': \
        'gcloud auth activate-service-account MYCLIENTEMAIL '
        '--key-file=/PATH/TO/MY/JSON/SERVICEACCOUNT',
    'set_project': \
        'gcloud config set project MYPROJECTID',
    'list_vm': \
        'gcloud compute instances list',
    'create_vm': \
        'gcloud compute instances create-with-container VMNAME '
        '--project=MYPROJECTID --zone=MYZONE --machine-type=e2-medium '
        '--image=projects/cos-cloud/global/images/cos-stable-101-17162-40-5 '
        '--boot-disk-size=10GB --boot-disk-type=pd-balanced '
        '--boot-disk-device-name=VMNAME '
        '--container-image=eu.gcr.io/MYPROJECTID/MYCONTAINER --container-restart-policy=always '
        '--labels=container-vm=cos-stable-101-17162-40-5 --no-shielded-secure-boot '
        '--shielded-vtpm --shielded-integrity-monitoring '
        '--metadata startup-script="#!/bin/bash\n sleep 10\n sudo useradd -m bob\n sudo -u bob docker-credential-gcr configure-docker\n sudo usermod -aG docker bob\n sudo -u bob docker run eu.gcr.io/MYPROJECTID/MYCONTAINER\n sudo poweroff" ',
    'delete_vm': \
        'gcloud compute instances delete VMNAME --zone=MYZONE --delete-disks=boot',
}

active_account and set_project are used respectively to activate the service account and set the correct working project (where we want to run the VMs). This is required and needed when Airflow is running outside the GCP project where the VMs are instantiated. It's also important to have ComputeEngine privileges on the service account used. The container images to run must be located in the container registry of the same project where the VMs are instantiated.

list_vm returns the list of the existing VMs in the project with relative features and status (RUNNING/TERMINATED).

create_vm creates the VM attaching the docker to run from the container registry. The command to create the VM can be customized according to your needs. Important to note, you must add --metadata startup-script that includes the run of the docker and the VM power off when the docker finishes running. (to see how the startup script is generated see here).

delete_vm simply deletes the VM created by create_vm.

All these commands can be combined to work together in an Airflow DAG in this way:

import re
import os
import datetime
import subprocess

import airflow
from airflow.sensors.python import PythonSensor
from airflow.operators.bash_operator import BashOperator


def vm_run_check():
    "function to list all the VMs and check their status"
    
    finish_run = False
    output = subprocess.check_output(
        bash_cmd['active_account'] + " && " + \
        bash_cmd['set_project'] + " && " + \
        bash_cmd['list_vm'], 
        shell=True
    )
    output = output.decode("utf-8").split("\n")[:-1]

    machines = []
    for i in range(1,len(output)):
        m = {}
        for match in re.finditer(r"([A-Z_]+)( +)?", output[0]+" "*10):
            span = match.span()
            m[match.group().strip()] = output[i][span[0]:span[1]].strip()
        machines.append(m)
    machines = {m['NAME']:m for m in machines}
    
    if VMNAME in machines:
        if machines[VMNAME]['STATUS'] == 'TERMINATED':
            finish_run = True
    
    return finish_run


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': [''],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0,
}

with models.DAG(
        'MYDAGNAME',
        catchup=False,
        default_args=default_args,
        start_date=datetime.datetime.now() - datetime.timedelta(days=3),
        schedule_interval='0 4 * * *',  # every day at 04:00 AM UTC
) as dag:
    
    
    create_vm = BashOperator(
           task_id="create_vm", 
           bash_command = bash_cmd['active_account'] + " && " + \
                            bash_cmd['set_project'] + " && " + \
                            bash_cmd['create_vm']
    )
    
    sensor_vm_run = PythonSensor(
        task_id="sensor_vm_run"
        python_callable=vm_run_check,
        poke_interval=60*2,  # check every 2 minutes
        timeout=60*60,  # check every 2 minutes for an hour
        soft_fail=True,
        mode="reschedule",
    )
    
    delete_vm = BashOperator(
           task_id="delete_vm", 
           bash_command = bash_cmd['active_account'] + " && " + \
                            bash_cmd['set_project'] + " && " + \
                            bash_cmd['delete_vm']
    )
    
    create_vm >> sensor_vm_run >> delete_vm
Marco Cerliani
  • 21,233
  • 3
  • 49
  • 54