I have a bunch of data in files stored in Amazon S3 and am planning to use it to build a Data Vault in Redshift. My first question is if the right approach is to build the DV and Data Marts all in Redshift or if I should consider the S3 as my Data Lake and have only the Data Marts in Redshift?
In my architecture I'm currently considering the former (i.e. S3 Data Lake + Redshift Vault and Marts). However, I don't know if I can create ETL processes directly in Redshift to populate the Marts with data in the Vault or if I'll have to for example use Amazon EMR to process the raw data in S3, generate new files there and finally load them in the Marts.
So, my second question is: What should the ETL strategy be? Thanks.