We are setting up a data platform loosely based on the Data Lake architecture. We are evaluating candidates that provide centralized data catalog and meta-data management and tagging. Glue seems very promising, but it's still not out for public consumption, so we looked into
- Ground
- Waterline
- Zaloni
Ground is fairly DYI. It seems we have to extend it extensively to make it work for us. (Scavenging from S3, Writing to Titan)
Waterline and Zaloni are packaged full-blown solutions that might not be what we are looking for since we prefer open-sources, point solutions.
Are there are any alternatives that we should look for? We like the MetaModel available in Ground and are leaning towards using this with Kinesis schema management.