Since QuickSight can directly query S3, when would we need to use Athena as data source for QuickSight?

Question

May be I am missing something but I am not able to understand what benefit I will get if I connect Athena with QuickSight instead of connecting QuickSight directly with S3. Please help me to understand this.

score 11 · Accepted Answer · answered Nov 17 '17 at 16:26

11

Amazon S3 is an object storage built to store and retrieve any amount of data. Basically, it has some raw data or unstructured data (in certain file format .csv or .tsv).

Amazon Athena uses a managed Data Catalog to store information and schemas about the databases and tables that you create for your data stored in Amazon S3. So, Athena knows about the data and its structure (i.e. some schema) in S3.

Also, QuickSight can directly connect to the Athena database and query the data for analysis. When you connect to Athena database, you are most likely to handle structured or semi-structured data.

Amazon S3 Manifest Files are not required when the data source is Amazon Athena.

Some limitations while connecting to S3 directly:-

No file specified in the manifest can exceed 1 GB in size, the total size of the all the files specified can't exceed 10 GB, and the total number of files specified can't exceed 1000.

The above limitations are not available when you use create the Data Set Using Amazon Athena Data.

Another Feature when creating a Data Set Using Amazon Athena Data :-

You can directly analyze the data without loading or load into SPICE and analyze the data.

Conclusion:-

If you have not done anything on your S3 files, you can just ahead and use QuickSight using S3 as data set.

In case, if you have loaded the S3 data into Athena, then you can use Athena as data set for QuickSight.

By using the Athena or any other data source, you will get few benefits and can overcome some limitation (i.e. file size) mentioned above.

answered Nov 17 '17 at 16:26

notionquest

37,595
6
111
105

super. Many Thanks :) – Anand Shaw Nov 18 '17 at 20:01
1

Please can you explain "if you have loaded the S3 data into Athena, then you can use Athena as data set for QuickSight". My understanding is that the results of a query fired using Athena gets stored in a S3 bucket (example results S3 bucket name: `results`). This means that you cannot load anything into Athena. When Athena is set as data set for QuickSight, then QuickSight calls the athena which results in query being fired into the `source S3 bucket` and the results are stored in the `results S3 bucket`. Then QuickSight displays the charts/results based on values in the `results S3 bucket`. – variable Nov 19 '20 at 08:36
Are these S3 limitations (1GB per file, 10GB total) still valid? Don't see them here: https://docs.aws.amazon.com/quicksight/latest/user/data-source-limits.html – chaooder Dec 15 '20 at 15:59
Correct that you dont 'store' data in Athena. It just queries the files natively in S3 (true the query results are also stored to S3, but that's not that relevant to data you want to use in your dashboards). When connecting to Athena in QuickSight you have the option of using Direct Query (live connect) or importing into SPICE, vs the S3 connector always imports data to SPICE. You also get the abilty to use Custom SQL with Athena, so can do some data transformations too. Lastly Athena supports querying Parquet, Avro etc whereas you cannot import Parquet files using the S3 connector. – dondata Apr 26 '22 at 04:03
I dont think there are any specific file size limits with S3... just the 1k file per dataset limit (an S3 manifest limitation). Overall I think using Athena to query your S3 data is more flexible. – dondata Apr 26 '22 at 04:07

Since QuickSight can directly query S3, when would we need to use Athena as data source for QuickSight?

1 Answers1

Linked