Questions tagged [amazon-athena]

Amazon Athena is a service for running SQL queries against data stored on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Amazon Athena is a service for running SQL queries against data stored in files on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Athena is powered by the Presto query engine and uses Apache Hive Metastore for database and table definitions. It supports both dynamic and static partitions for tables. Athena supports data stored in delimited text files, JSON, ORC, Avro, and Parquet.

Athena is a serverless tool - there is no infrastructure to manage, and cost is calculated by the quantity of data scanned during each query.

See the Athena Documentation for more.

3440 questions

vote

2 answers

How do I execute the SHOW PARTITIONS command on an Athena table?

I'm using AWS Athena with AWS Glue for the first time, with S3 providing a 'folder' structure which maps to the partitions in my data - I'm getting into the concepts so please excuse any mistaken description! I'm looking at what happens when I add…

asked Oct 17 '19 at 10:51

Elliveny

2,159
1
20
28

vote

1 answer

AWS Athena: Handling big numbers

I have files on S3 where two columns contain only positive integers which can be of 10^26. Unfortunately, according to AWS Docs Athena only supports values in a range up to 2^63-1 (approx 10^19). So at the moment these column are represented as a…

biginteger presto amazon-athena

asked Oct 16 '19 at 19:10

Ilya Kisil

2,490
2
17
31

vote

2 answers

Query by multiple values from an AWS Athena bucketed table

I have a bucketed table from which I want to query by multiple values. Here is an example: SELECT * FROM my_bucketed_table WHERE bucketed_column IN (value1, value2) The result is a full scan of the table, instead of using the index. When I used…

sql amazon-athena

asked Oct 15 '19 at 10:47

Ori N

vote

1 answer

Athena sub-query and LEFT JOIN data scanned optimization

There is a table with parquet data format of 20 GB and simple query will give results by scanning only 1GB of data. select columns from table1 where id in (id1, id2, idn) If same query is executed with a sub-query such as - select columns from…

presto amazon-athena trino

asked Oct 07 '19 at 12:28

Amrit Jangid

vote

1 answer

How to query ambiguous data types in Athena?

I have a data set stored in Parquet files crawled from S3 and registered in Glue Data Catalog. Some of the columns are of ambiguous type. For example column col is typed as struct. If I select from that table tbl, then…

presto amazon-athena

asked Oct 04 '19 at 13:44

Raffael

19,547
15
82
160

vote

1 answer

AWS Athena insert into with named columns not working in pyspark

I've created a little test table using pyspark query=""" CREATE EXTERNAL TABLE IF NOT EXISTS test1 ( c1 INT, c2 INT, c3 INT ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION 's3://mybucket/myfolder/' """ spark.sql(query) And…

sql pyspark amazon-athena

asked Oct 03 '19 at 08:42

user2699504

vote

1 answer

Athena throws Simba JDBC 11300 with ALTER TABLE ADD PARTITION

Running an ALTER TABLE ADD PARTITION using the Athena JDBC driver is throwing an Exception that is proving hard to understand, this is the code: package none; import java.sql.Connection; import java.sql.DriverManager; import…

jdbc amazon-athena

asked Oct 02 '19 at 18:16

jmng

2,479
1
25
38

vote

1 answer

Create athena table from files in S3 using AWS Glue

I am trying to create an Athena table from a file in an S3 bucket using AWS Glue. I am relatively new to AWS and so even a nudge in the right direction could be a big help. The problem is I can't parse the contents of the file in the S3 bucket. I…

amazon-web-services amazon-s3 amazon-athena

asked Oct 01 '19 at 15:52

Tom C

vote

1 answer

Create external table not working in spark, working in Athena

I'm attemptiing to use pyspark to create an external table. My code is similar to the following: query=""" CREATE EXTERNAL TABLE IF NOT EXISTS myschema.mytable ( col1 STRING, col2 STRING, col3 STRING, col4 STRING, ... ... a further 600+ columns…

pyspark amazon-athena

asked Sep 27 '19 at 08:00

user2699504

vote

1 answer

Format pivot data with multiple conditions

My current query is SELECT COUNT (DISTINCT("json_extract_scalar"("data", '$.user_id'))) AS users, event, date(timestamp) FROM tableName WHERE category='category' GROUP BY event, date(timestamp) ORDER BY date(timestamp) DESC Right now, the output it…

sql amazon-web-services amazon-athena

asked Sep 26 '19 at 11:33

1mpe7us

vote

2 answers

AWS Athena - Query over large external table generated from Glue crawler?

I have a large set of history log files on aws s3 that sum billions of lines, I used a glue crawler with a grok deserializer to generate an external table on Athena, but querying it has proven to be unfeasible. My queries have timed out and I am…

amazon-athena aws-glue

asked Sep 25 '19 at 18:54

Guga Figueiredo

vote

2 answers

cast and left functions in Athena

cast and convert function works as expected in Athena: SELECT code_2 as mydate, cast( code_2 as varchar) from some_table but how do I extract 8 leftmost characters? This throws an error: SELECT code_2 as mydate, left(cast( code_2 as…

sql presto amazon-athena

asked Sep 23 '19 at 05:13

shantanuo

31,689
78
245
403

vote

1 answer

Hive - Regex for the SYSLOG/ERRORLOG

I want to query the syslog(basically its my SQL error log) using Athena. here is my sample data. 2019-09-21T12:19:32.107Z 2019-09-21 12:19:24.17 Server Buffer pool extension is already disabled. No action is necessary.…

regex hive create-table amazon-athena regexserde

asked Sep 22 '19 at 16:37

TheDataGuy

2,712
6
37
89

vote

1 answer

Extracting from Json in AWS Athena or Presto

My query below does not give me any result WITH dataset AS ( SELECT responseelements FROM cloudtrail_logs WHERE useridentity.type = 'Root' AND eventname='CreateVpc' ORDER BY eventsource, eventname; AS blob ) …

json amazon-web-services presto amazon-athena

asked Sep 20 '19 at 12:52

Subrata Fouzdar

vote

0 answers

Using R to access Athena - federated account

I am trying to download some data from athena (AWS) - my account is managed by SAML authentication. I managed to get the authentication in place correctly (my credentials are stored in a profile called 'saml') My problem is actually related to the…

r amazon-web-services jdbc saml amazon-athena

asked Sep 18 '19 at 15:13

Michael

2,436
1
36
57

Prev 1 2 3

…

100