A simple and clean solution to run a premade container using VMs with Airflow may consist in chaining the 3 steps below:
- create a fresh new VM (through a BashOperator) with a startup script that pulls/runs the container and shut down the VM when the running is done;
- use a PythonSensor to check when the VM is stopped (i.e. the docker finish running);
- delete the VM (through a BashOperator) in order to repeat the previous steps when the airflow dag is triggered the next time.
All we need are the below bash commands:
bash_cmd = {
'active_account': \
'gcloud auth activate-service-account MYCLIENTEMAIL '
'--key-file=/PATH/TO/MY/JSON/SERVICEACCOUNT',
'set_project': \
'gcloud config set project MYPROJECTID',
'list_vm': \
'gcloud compute instances list',
'create_vm': \
'gcloud compute instances create-with-container VMNAME '
'--project=MYPROJECTID --zone=MYZONE --machine-type=e2-medium '
'--image=projects/cos-cloud/global/images/cos-stable-101-17162-40-5 '
'--boot-disk-size=10GB --boot-disk-type=pd-balanced '
'--boot-disk-device-name=VMNAME '
'--container-image=eu.gcr.io/MYPROJECTID/MYCONTAINER --container-restart-policy=always '
'--labels=container-vm=cos-stable-101-17162-40-5 --no-shielded-secure-boot '
'--shielded-vtpm --shielded-integrity-monitoring '
'--metadata startup-script="#!/bin/bash\n sleep 10\n sudo useradd -m bob\n sudo -u bob docker-credential-gcr configure-docker\n sudo usermod -aG docker bob\n sudo -u bob docker run eu.gcr.io/MYPROJECTID/MYCONTAINER\n sudo poweroff" ',
'delete_vm': \
'gcloud compute instances delete VMNAME --zone=MYZONE --delete-disks=boot',
}
active_account
and set_project
are used respectively to activate the service account and set the correct working project (where we want to run the VMs). This is required and needed when Airflow is running outside the GCP project where the VMs are instantiated. It's also important to have ComputeEngine privileges on the service account used. The container images to run must be located in the container registry of the same project where the VMs are instantiated.
list_vm
returns the list of the existing VMs in the project with relative features and status (RUNNING/TERMINATED).
create_vm
creates the VM attaching the docker to run from the container registry. The command to create the VM can be customized according to your needs. Important to note, you must add --metadata startup-script
that includes the run of the docker and the VM power off when the docker finishes running. (to see how the startup script is generated see here).
delete_vm
simply deletes the VM created by create_vm
.
All these commands can be combined to work together in an Airflow DAG in this way:
import re
import os
import datetime
import subprocess
import airflow
from airflow.sensors.python import PythonSensor
from airflow.operators.bash_operator import BashOperator
def vm_run_check():
"function to list all the VMs and check their status"
finish_run = False
output = subprocess.check_output(
bash_cmd['active_account'] + " && " + \
bash_cmd['set_project'] + " && " + \
bash_cmd['list_vm'],
shell=True
)
output = output.decode("utf-8").split("\n")[:-1]
machines = []
for i in range(1,len(output)):
m = {}
for match in re.finditer(r"([A-Z_]+)( +)?", output[0]+" "*10):
span = match.span()
m[match.group().strip()] = output[i][span[0]:span[1]].strip()
machines.append(m)
machines = {m['NAME']:m for m in machines}
if VMNAME in machines:
if machines[VMNAME]['STATUS'] == 'TERMINATED':
finish_run = True
return finish_run
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email': [''],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
}
with models.DAG(
'MYDAGNAME',
catchup=False,
default_args=default_args,
start_date=datetime.datetime.now() - datetime.timedelta(days=3),
schedule_interval='0 4 * * *', # every day at 04:00 AM UTC
) as dag:
create_vm = BashOperator(
task_id="create_vm",
bash_command = bash_cmd['active_account'] + " && " + \
bash_cmd['set_project'] + " && " + \
bash_cmd['create_vm']
)
sensor_vm_run = PythonSensor(
task_id="sensor_vm_run"
python_callable=vm_run_check,
poke_interval=60*2, # check every 2 minutes
timeout=60*60, # check every 2 minutes for an hour
soft_fail=True,
mode="reschedule",
)
delete_vm = BashOperator(
task_id="delete_vm",
bash_command = bash_cmd['active_account'] + " && " + \
bash_cmd['set_project'] + " && " + \
bash_cmd['delete_vm']
)
create_vm >> sensor_vm_run >> delete_vm