Questions tagged [amazon-athena]

Amazon Athena is a service for running SQL queries against data stored on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Amazon Athena is a service for running SQL queries against data stored in files on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Athena is powered by the Presto query engine and uses Apache Hive Metastore for database and table definitions. It supports both dynamic and static partitions for tables. Athena supports data stored in delimited text files, JSON, ORC, Avro, and Parquet.

Athena is a serverless tool - there is no infrastructure to manage, and cost is calculated by the quantity of data scanned during each query.

See the Athena Documentation for more.

3440 questions
10
votes
2 answers

Query exhausted resources at this scale factor

I was running SQL query on Amazon Athena. And I got the following error couple of times: Query exhausted resources at this scale factor This query ran against the "test1" database, unless qualified by the query. Please post the error message on our…
Ridwan
  • 301
  • 5
  • 12
10
votes
3 answers

How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

I am trying to read csv data from s3 bucket and creating a table in AWS Athena. My table when created was unable to skip the header information of my CSV file. Query Example : CREATE EXTERNAL TABLE IF NOT EXISTS table_name ( `event_type_id` …
10
votes
2 answers

How does AWS Athena react to schema changes in S3 files?

What happens when after creating the table in AWS Athena for files on S3, the structure of the files on S3 change? For eg: If the files previously had 5 columns when the table was created and later the new files started getting 1 more column: a) at…
Sarang
  • 147
  • 3
  • 9
10
votes
4 answers

AWS Athena: use "folder" name as partition

I have thousands of individual json files (corresponding to one Table row) stored in s3 with the following path: s3://my-bucket//dataXX.json When I create my table in DDL, is it possible to have the data partitioned by the present in the S3…
Raphael
  • 998
  • 1
  • 7
  • 8
9
votes
2 answers

Using Athena SQL with regex

I'm using DbVisualizer to connect to an athena instance. I have a working query: SELECT device, description, id, size, date FROM test.database WHERE month = '01' and device not like '%link%' and device not like '%Link%' and device not like…
chowpay
  • 1,515
  • 6
  • 22
  • 44
9
votes
2 answers

Athena: Queries of this type are not supported

I have the current query in athena. SELECT col1, col_2, A.col_3 FROM (SELECT col_1, col_3 FROM table_1 JOIN col_3 WHERE col_1 IN (SELECT DISTINCT col_1 FROM…
César Correa
  • 178
  • 1
  • 2
  • 11
9
votes
3 answers

AWS Athena ALIAS in Group By does not get resolved

I have a very basic group by query in Athena where I would like to use an alias. One can make the example work by putting the same reference in the group by, but that's not really handy when there's complex column modifications going on and logic…
supernova
  • 1,762
  • 1
  • 14
  • 31
9
votes
3 answers

How to extract month name on a string datatype on athena

SELECT sales_invoice_date, MONTH( DATE_TRUNC('month', CASE WHEN TRIM(sales_invoice_date) = '' THEN DATE('1999-12-31') ELSE …
Ray
  • 133
  • 1
  • 1
  • 7
9
votes
2 answers

S3/Athena query result location and “Invalid S3 folder location”

Are there particular requirements to the bucket for specifying the query result location? When I try to create a new table, I get a popup: Before you run your first query, you need to set up a query result location in Amazon S3. Learn more So I…
tom
  • 1,331
  • 1
  • 15
  • 28
9
votes
1 answer

How call date_trunc function in amazon Athena?

I am trying to select the date_trunc value: select date_trunc(HOUR, current_date - interval '1' hour); OR select date_trunc(HOUR, current_date); And got error: [42703] ERROR: column "hour" does not exist Позиция: 19
Cherry
  • 31,309
  • 66
  • 224
  • 364
9
votes
3 answers

Specify a SerDe serialization lib with AWS Glue Crawler

Every time I run a glue crawler on existing data, it changes the Serde serialization lib to LazySimpleSerDe, which doesn't classify correctly (e.g. for quoted fields with commas in) I then need to manually edit the table details in the Glue Catalog…
9
votes
2 answers

athena.getQueryExecution returns a status of RUNNING for 1 minute+, even though history says the job took 4 seconds

I'm writing a node app using typescript and the aws-sdk to call Athena. Occasionally, the call to Athena will timeout after a minute of waiting for the Athena response. The thing I don't understand is that if I go to the AWS console and look at…
Daniel Kaplan
  • 62,768
  • 50
  • 234
  • 356
9
votes
1 answer

Selecting specific files for athena

While creating a table in Athena, I am not able to create tables using specific files. Is there any way to select all the files starting with "year_2019" from a given bucket? For e.g. s3://bucketname/prefix/year_2019*.csv The documentation is very…
shantanuo
  • 31,689
  • 78
  • 245
  • 403
9
votes
0 answers

Reusing subqueries in AWS Athena generate large amount of data scanned

On AWS Athena, I am trying to reuse computed data using a WITH clause, e.g. WITH temp_table AS (...) SELECT ... FROM temp_table t0, temp_table t1, temp_table t2 WHERE ... If the query is fast, the "Data scanned" goes through the roof. As if…
user7094
  • 215
  • 2
  • 7
9
votes
1 answer

Can I use Athena View as a source for a AWS Glue Job?

I'm trying to use an Athena View as a data source to my AWS Glue Job. The error message I'm getting while trying to run the Glue job is about the classification of the view. What can I define it as? Thank you Error Message Appearing