0

I am going through with AWS Athena Federated Query page and got to know "you can use Athena Federated Query (Preview) to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3."

If I have a huge data outside the AWS account , still I have to transfer data to S3 , so that I can use it in Athena?

Please share your experience guys ? Thanks

Amit Dass
  • 41
  • 6

3 Answers3

2

If I have a huge data outside the AWS account , still I have to transfer data to S3 , so that I can use it in Athena?

No, you don't need to transfer data to S3 to query it with Athena Federated Query. You can just connect your external sources and query them. However, the result of you query will always be saved on S3.

This makes it a relatively easy way to extract, transfer and load data from external sources to S3 (if you want to use it for example for other services within AWS).

With Athena Federated Query there is no need to build complicated ETL workflows anymore. Just query your external datasource and the data is in S3.

0

Athena has support for additional sources. Amazon provides a list of these data source connectors with documentation exactly how to implement each one.

Its worth stating that if you want Athena to be performant/secure transferring data over the internet challenges this.

Chris Williams
  • 32,215
  • 4
  • 30
  • 68
  • Thanks for the response . I am just curious to know why AWS document mention that using these connector we can connect external sources but why to fetch data to S3 , it will again a storage cost for S3 . – Amit Dass May 26 '20 at 14:00
  • I guess primarily it has been optimised to work with S3, perhaps less features on other connectors – Chris Williams May 26 '20 at 14:02
0

Do not need to transfer the data to S3. for example you can query data in DynamoDB directly with Lambda connector which is ready to use from AWS.

Just worth to mention one thing here is that Athena timeout is 30 minutes, but If you are going to use some connectors that use Lambda, keep in mind that max timeout for Lambda is 15 minutes.