4

I am evaluating Apache airflow for production use in a data environment and I would like to know if with airflow you can run operators in self contained docker environments on an auto scaling Kubernetes cluster.

I found the following operator: KubernetesPodOperator which seems to do the job, but the only examples I have found have been on Google Cloud. I would like to run this on AWS, however I haven't found any examples of how this would be done. I believe AWS EKS or AWS fargate might fit the bill but not sure.

Can anyone with airflow experience please let me know if this is possible? I have looked online and haven't found anything clear yet.

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
maldman
  • 53
  • 1
  • 1
  • 5

2 Answers2

1

We have been using Fargate and Airflow in production and the experience so far has been good.

We have been using it for transient workloads and it is turning out to be cheaper for us than having a dedicated Kubernetes cluster. Also, there is no management overhead of any kind.

Github — Airflow DAG with ECSOperatorConfig

GTO
  • 56
  • 1
  • 3
  • Thanks GTO for the example. Is there a way to define CPU and RAM requirements for different operators? – maldman Jul 11 '19 at 11:58
  • Yes for individual operators you can define CPU and RAM requirements. Check the containerOverrides section for "cpu", "memory", "memoryReservation"- https://docs.aws.amazon.com/cli/latest/reference/ecs/run-task.html – GTO Jul 12 '19 at 03:53
  • @GTO does this line https://github.com/ishan4488/airflow-fargate-example/blob/aa15287b36545f0a19438913f59ce0f02f831950/airflow-fargate-example/airflowDag.py#L28 mean I have to create a task called "my_automation_task" in ECS in advance? And i have to ensure that the name in python and in ECS must match? – xliiv Feb 06 '20 at 10:56
  • @xliiv yes. You will have to create a task in ECS and specify the name here. – GTO Mar 08 '20 at 11:24
0

You can use Apache Airflow DAG operators in any cloud provider, not only GKE.

Airflow-on-kubernetes-part-1-a-different-kind-of-operator as like as Airflow Kubernetes Operator articles provide basic examples how to use DAG's.

Also Explore Airflow KubernetesExecutor on AWS and kops article provides good explanation, with an example on how to use airflow-dags and airflow-logs volume on AWS.

Example:

from airflow.operators.python_operator import PythonOperator
from airflow.models import DAG
from datetime import datetime
import time
import os

args = {
    'owner': 'airflow',
    "start_date": datetime(2018, 10, 4),
}

dag = DAG(
    dag_id='test_kubernetes_executor',
    default_args=args,
    schedule_interval=None
)

def print_stuff():
    print("Hi Airflow")

for i in range(2):
    one_task = PythonOperator(
        task_id="one_task" + str(i),
        python_callable=print_stuff,
        dag=dag
    )

    second_task = PythonOperator(
        task_id="two_task" + str(i),
        python_callable=print_stuff,
        dag=dag
    )

    third_task = PythonOperator(
        task_id="third_task" + str(i),
        python_callable=print_stuff,
        dag=dag
    )

    one_task >> second_task >> third_task
BartoszKP
  • 34,786
  • 15
  • 102
  • 130
Vit
  • 7,740
  • 15
  • 40
  • Thank you VKR for the response. I was hoping for running doing something very similar to the article https://medium.com/@chengzhizhao/explore-airflow-kubernetesexecutor-on-aws-and-kops-1c4dd33e56e0 with one difference. Do you think it would be possible to run the tasks rather on kops running on EC2 machines it would be possible to run it against EKS/Fargate? – maldman Feb 05 '19 at 16:12
  • yes, why not. This is multi platform thing, and as I mentioned above - you should be able to run it successfully on AWS – Vit Feb 05 '19 at 16:14