0

We are planning to source data from another AWS account's S3 by using AWS redshift spectrum. But Source informed that bucket key will change every day and latest data will be available in the bucket key location with latest timestamp. Can anyone suggest what is the best way to create this external table?

Rajib Kar
  • 21
  • 3
  • Hello! Welcome to StackOverflow! Please read [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) in the [Help Center](https://stackoverflow.com/help). – Vlad Mar 12 '19 at 09:02
  • I think you will need to recreate the spectrum table every day. – Jon Scott Mar 12 '19 at 10:14

1 Answers1

0

External table in Spectrum can be either configured to point to a prefix in S3 (kind of like folder in a normal filesystem) or you can use a manifest file to specify the exact list of files the table should comprise of ( they can even reside in different s3 buckets).

So you will have to create the table every day and point it to the correct location. If all the files end up in the same s3 prefix you will have to use manifest file to specify the current one.

a hint not directly related to the question: What you could also do, is to create tables daily with a timestamp in the name, and every day create a view pointing to the latest table. This way it will be easy to have a look at the historical data, or of you use the data for eg. machine learning - pin the input to a immutable version of data so that you can reproducably fetch training data - but this of course depends on your requirements.

botchniaque
  • 4,698
  • 3
  • 35
  • 63