Questions tagged [amazon-athena]

Amazon Athena is a service for running SQL queries against data stored on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Amazon Athena is a service for running SQL queries against data stored in files on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Athena is powered by the Presto query engine and uses Apache Hive Metastore for database and table definitions. It supports both dynamic and static partitions for tables. Athena supports data stored in delimited text files, JSON, ORC, Avro, and Parquet.

Athena is a serverless tool - there is no infrastructure to manage, and cost is calculated by the quantity of data scanned during each query.

See the Athena Documentation for more.

3440 questions
1
vote
2 answers

How do I execute the SHOW PARTITIONS command on an Athena table?

I'm using AWS Athena with AWS Glue for the first time, with S3 providing a 'folder' structure which maps to the partitions in my data - I'm getting into the concepts so please excuse any mistaken description! I'm looking at what happens when I add…
Elliveny
  • 2,159
  • 1
  • 20
  • 28
1
vote
1 answer

AWS Athena: Handling big numbers

I have files on S3 where two columns contain only positive integers which can be of 10^26. Unfortunately, according to AWS Docs Athena only supports values in a range up to 2^63-1 (approx 10^19). So at the moment these column are represented as a…
Ilya Kisil
  • 2,490
  • 2
  • 17
  • 31
1
vote
2 answers

Query by multiple values from an AWS Athena bucketed table

I have a bucketed table from which I want to query by multiple values. Here is an example: SELECT * FROM my_bucketed_table WHERE bucketed_column IN (value1, value2) The result is a full scan of the table, instead of using the index. When I used…
Ori N
  • 555
  • 10
  • 22
1
vote
1 answer

Athena sub-query and LEFT JOIN data scanned optimization

There is a table with parquet data format of 20 GB and simple query will give results by scanning only 1GB of data. select columns from table1 where id in (id1, id2, idn) If same query is executed with a sub-query such as - select columns from…
Amrit Jangid
  • 168
  • 1
  • 9
1
vote
1 answer

How to query ambiguous data types in Athena?

I have a data set stored in Parquet files crawled from S3 and registered in Glue Data Catalog. Some of the columns are of ambiguous type. For example column col is typed as struct. If I select from that table tbl, then…
Raffael
  • 19,547
  • 15
  • 82
  • 160
1
vote
1 answer

AWS Athena insert into with named columns not working in pyspark

I've created a little test table using pyspark query=""" CREATE EXTERNAL TABLE IF NOT EXISTS test1 ( c1 INT, c2 INT, c3 INT ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION 's3://mybucket/myfolder/' """ spark.sql(query) And…
user2699504
  • 195
  • 1
  • 4
  • 18
1
vote
1 answer

Athena throws Simba JDBC 11300 with ALTER TABLE ADD PARTITION

Running an ALTER TABLE ADD PARTITION using the Athena JDBC driver is throwing an Exception that is proving hard to understand, this is the code: package none; import java.sql.Connection; import java.sql.DriverManager; import…
jmng
  • 2,479
  • 1
  • 25
  • 38
1
vote
1 answer

Create athena table from files in S3 using AWS Glue

I am trying to create an Athena table from a file in an S3 bucket using AWS Glue. I am relatively new to AWS and so even a nudge in the right direction could be a big help. The problem is I can't parse the contents of the file in the S3 bucket. I…
Tom C
  • 322
  • 3
  • 12
1
vote
1 answer

Create external table not working in spark, working in Athena

I'm attemptiing to use pyspark to create an external table. My code is similar to the following: query=""" CREATE EXTERNAL TABLE IF NOT EXISTS myschema.mytable ( col1 STRING, col2 STRING, col3 STRING, col4 STRING, ... ... a further 600+ columns…
user2699504
  • 195
  • 1
  • 4
  • 18
1
vote
1 answer

Format pivot data with multiple conditions

My current query is SELECT COUNT (DISTINCT("json_extract_scalar"("data", '$.user_id'))) AS users, event, date(timestamp) FROM tableName WHERE category='category' GROUP BY event, date(timestamp) ORDER BY date(timestamp) DESC Right now, the output it…
1mpe7us
  • 35
  • 6
1
vote
2 answers

AWS Athena - Query over large external table generated from Glue crawler?

I have a large set of history log files on aws s3 that sum billions of lines, I used a glue crawler with a grok deserializer to generate an external table on Athena, but querying it has proven to be unfeasible. My queries have timed out and I am…
Guga Figueiredo
  • 292
  • 4
  • 18
1
vote
2 answers

cast and left functions in Athena

cast and convert function works as expected in Athena: SELECT code_2 as mydate, cast( code_2 as varchar) from some_table but how do I extract 8 leftmost characters? This throws an error: SELECT code_2 as mydate, left(cast( code_2 as…
shantanuo
  • 31,689
  • 78
  • 245
  • 403
1
vote
1 answer

Hive - Regex for the SYSLOG/ERRORLOG

I want to query the syslog(basically its my SQL error log) using Athena. here is my sample data. 2019-09-21T12:19:32.107Z 2019-09-21 12:19:24.17 Server Buffer pool extension is already disabled. No action is necessary.…
TheDataGuy
  • 2,712
  • 6
  • 37
  • 89
1
vote
1 answer

Extracting from Json in AWS Athena or Presto

My query below does not give me any result WITH dataset AS ( SELECT responseelements FROM cloudtrail_logs WHERE useridentity.type = 'Root' AND eventname='CreateVpc' ORDER BY eventsource, eventname; AS blob ) …
Subrata Fouzdar
  • 724
  • 5
  • 17
1
vote
0 answers

Using R to access Athena - federated account

I am trying to download some data from athena (AWS) - my account is managed by SAML authentication. I managed to get the authentication in place correctly (my credentials are stored in a profile called 'saml') My problem is actually related to the…
Michael
  • 2,436
  • 1
  • 36
  • 57
1 2 3
99
100