Questions tagged [amazon-athena]

Amazon Athena is a service for running SQL queries against data stored on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Amazon Athena is a service for running SQL queries against data stored in files on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Athena is powered by the Presto query engine and uses Apache Hive Metastore for database and table definitions. It supports both dynamic and static partitions for tables. Athena supports data stored in delimited text files, JSON, ORC, Avro, and Parquet.

Athena is a serverless tool - there is no infrastructure to manage, and cost is calculated by the quantity of data scanned during each query.

See the Athena Documentation for more.

3440 questions
11
votes
5 answers

Offloading data files from Amazon Redshift to Amazon S3 in Parquet format

I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from…
Teja
  • 13,214
  • 36
  • 93
  • 155
11
votes
3 answers

AWS Athena MSCK REPAIR TABLE takes too long for a small dataset

I am having issues with amazon athena, I have a small bucket ( 36430 objects , 9.7 mb ) with 4 levels of partition ( my-bucket/p1=ab/p2=cd/p3=ef/p4=gh/file.csv ) but when I run the command MSCK REPAIR TABLE db.table is taking over 25 minutes, and I…
JorgeGarza
  • 414
  • 7
  • 16
11
votes
1 answer

Converting Unix epoch time to extended ISO8601

I have 3 tables I would like to work on using the date, however one of the tables includes the date in unix epoch format. Here is an example of the 3 fields: Table1: 2017-02-01T07:58:40.756031Z Table2: 2017-02-07T10:16:46Z Table3: 1489236559 I…
Kelly Norton
  • 487
  • 2
  • 4
  • 19
11
votes
2 answers

Time diff in Amazon Athena / Presto (seconds and minutes )

I have a list of creation time stamps and ending time stamps , i would like to get the amount of seconds last from creation to ending . could not find any way to do that without using UNIX time stamp (which i dont have at the moment) . something…
Latent
  • 556
  • 1
  • 9
  • 23
11
votes
2 answers

Nested Query Alternatives in AWS Athena

I am running a query that gives a non-overlapping set of first_party_id's - ids that are associated with one third party but not another. This query does not run in Athena, however, giving the error: Correlated queries not yet supported. Was…
pauld
  • 401
  • 1
  • 5
  • 20
10
votes
1 answer

AWS Athena: Querying by an attributes of a struct with an array

I crawled data using aws glue to import json data from an s3 folder that contains data where the root braces is an array like this: [{id: '1', name: 'rick'},{id: '2', name: 'morty'}] This ends up resulting in a schema like this:…
cosbor11
  • 14,709
  • 10
  • 54
  • 69
10
votes
2 answers

Amazon Athena- Querying columns with numbers stored as string

I have a insurance dataset which includes the number of enrollment for each county. However the number of enrollments is stored as a string. How can i query the data for something like "Find the plans which have a enrollment of more than 50".…
caliGeek
  • 409
  • 2
  • 7
  • 19
10
votes
1 answer

How to convert varchar to array in Presto Athena

My data is in VARCHAR format. I want to split both the elements of this array so that I can then extract a key value from the JSON. Data format [ { "skuId": "5bc87ae20d298a283c297ca1", "unitPrice": 0, "id": "5bc87ae20d298a283c297ca1", …
dhankhar
  • 101
  • 1
  • 1
  • 5
10
votes
1 answer

How S3 select pricing works? What is data returned and scanned in s3 select means

I have a 1M rows of CSV data. select 10 rows, Will I be billed for 10 rows. What is data returned and data scanned means in S3 Select? There is less documentation on these terms of S3 select
10
votes
2 answers

How to solve SQL injection for Athena?

I am working on writing a Spring Java program accessing data from Athena, but I found that Athena JDBC driver does not support PreparedStatement, does anyone have idea about how to avoid SQL injection on Athena?
Tsing
  • 101
  • 1
  • 4
10
votes
7 answers

How to delete / drop multiple tables in AWS athena?

I am trying to drop few tables from Athena and I cannot run multiple DROP queries at same time. Is there a way to do it? Thanks!
Vidy
  • 111
  • 1
  • 1
  • 3
10
votes
4 answers

Amazon Athena - How can I exclude the metadata when create table based on query result

In Athena, I want to create a table based on the query result, but every query result contains 2 files, ".csv" and ".csv.metadata". All these files are in my table and the metadata makes the table looks messy. Is there any way to ignore these…
Hilda Chang
  • 193
  • 1
  • 11
10
votes
2 answers

Amazon Athena LEFT OUTER JOIN query not working as expected

I am trying to do a left ourter join in Athena and my query looks like the following: SELECT customer.name, orders.price FROM customer LEFT OUTER JOIN order ON customer.id = orders.customer_id WHERE price IS NULL; Where each customer could only…
ahajib
  • 12,838
  • 29
  • 79
  • 120
10
votes
1 answer

Use external table redshift spectrum defined in glue data catalog

I have a table defined in Glue data catalog that I can query using Athena. As there is some data in the table that I want to use with other Redshift tables, can I access the table defined in Glue data catalog? What will be the create external table…
10
votes
3 answers

Athena query results at specific path on S3

I am aware that running a saved Athena query stores results in an Amazon S3 location based on the name of the query and the date the query ran, as follows: QueryLocation}/{QueryName|Saved}/{yyyy}/{mm}/{dd}/{QueryID}/ Is it possible to override…
siberiancrane
  • 586
  • 1
  • 6
  • 20