I have been setting up data lakes for clients wherein we load the data from onprem or any other sources, into the S3 (a data lake). We will create an AWS Glue catalog on these raw data to create schemas.
The next step would be to either use an EMR or AWS Glue for some data cleansing, load the transformed data into RDS / REDSHIFT / S3 as final target.
The jobs can be scheduled using Data pipeline, Glue Jobs, or AWS Lambda event trigger depending on the use case / service used.
The analysts, other users would be provided required data / S3 bucket access using IAM service for Quicksight visualizations or data querying using Athena, Drill, etc. or use the data for ML applications in Sagemaker.
My question is how is AWS Lake Formation different from above traditional Data Lakes?
I can define that AWS Lake Formation provides all the above services such as S3, Glue Catalog, ETL code generator in Glue, Job scheduler, etc. are available in a single window? With some more advanced security for users / data (record / column level) that can be configured from within the Lake Formation console.
Is there anything else that makes Lake formation stand out from the traditional cloud based Data Lake?
Thanks