Apache Falcon - Feed management and data processing platform
From the docs:
Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters.
Why?
- Establishes relationship between various data and processing elements on a Hadoop environment
- Feed management services such as feed retention, replications across clusters, archival etc.
- Easy to onboard new workflows/pipelines, with support for late data handling, retry policies
- Integration with metastore/catalog such as Hive/HCatalog
- Provide notification to end customer based on availability of feed groups (logical group of related feeds, which are likely to be used together)
- Enables use cases for local processing in colo and global aggregations
- Captures Lineage information for feeds and processes