I have a Flask-based application using the Microservice Architecture.
- I have multiple services such as Scrapy for scraping product data.
- Multiple API Integration with the different service providers to pull and push data.
- ETL processes to message and insert/update information
- Email processes to send email to customers either for marketing or based on event notification such as their report is ready.
All of these tasks or I would say most of these tasks are orchestrated using the Celery/Worker app. Now I have many tasks that I need to schedule on an ongoing basis. These tasks can be easily scheduled through:
- Celery Beat
- CronTab
- Other alternatives.
I was looking at Apache Airflow and Celery Executor with the same. I am wondering, from a strategic perspective would it make sense to implement Airflow, or create a manual methodology using any of the above solutions to execute recurring tasks.
The +ve about Airflow:
- Great GUI
- DAGs can be defined to ensure task a is completed before task b begins. (Example, Scrapy gets product data and creates a CSV file, once that task is completed, I can have the ETL script to process the data.
- Automatic task management.
The -ve about Airflow:
- Celery is still the ultimate orchestrator of the task, I think Airflow will add too many layers to the core solution while providing a great GIU.
- Chaining of tasks in Celery can ensure that task a is completed before task b starts.
- Need to maintain the additional servers, workers, schedulers.
I would love your thoughts on whether you would use Apache Airflow, or can Celery be just enough.