What are the minimum hardware requirements for setting up an Apache Airflow cluster.
Eg. RAM, CPU, Disk etc for different types of nodes in the cluster.
What are the minimum hardware requirements for setting up an Apache Airflow cluster.
Eg. RAM, CPU, Disk etc for different types of nodes in the cluster.
I have had no issues using very small instances in pseudo-distributed mode (32 parallel workers; Postgres backend):
If you want distributed mode, you should be more than fine with that if you keep it homogenous. Airflow shouldn't really do heavy lifting anyways; push the workload out to other things (Spark, EMR, BigQuery, etc).
You will also have to run some kind of messaging queue, like RabbitMQ. I think they take Redis too. However, this doesn't really dramatically impact how you size.
We are running the airflow in AWS with below config
t2.small --> airflow scheduler and webserver
db.t2.small --> postgres for metastore
The parallelism parameter in airflow.cfg is set to 10 and there are around 10 users who access airflow UI
All we do from airflow is ssh to other instances and run the code from there