Questions tagged [amazon-athena]

Amazon Athena is a service for running SQL queries against data stored on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Amazon Athena is a service for running SQL queries against data stored in files on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Athena is powered by the Presto query engine and uses Apache Hive Metastore for database and table definitions. It supports both dynamic and static partitions for tables. Athena supports data stored in delimited text files, JSON, ORC, Avro, and Parquet.

Athena is a serverless tool - there is no infrastructure to manage, and cost is calculated by the quantity of data scanned during each query.

See the Athena Documentation for more.

3440 questions
1
vote
1 answer

PrestoDB/AWS Athena- Retrieve a large SELECT by chunks

I have to select more than 1.9 billion rows. I am trying to query a table hosted in a DB in AWS ATHENA console. The table is reading parquet files from the a S3 bucket. When I run this query: SELECT * FROM ids WHERE org = 'abcd' AND idkey =…
Raj
  • 93
  • 1
  • 11
1
vote
2 answers

Amazon Athena (Presto) SELECT statement to create (n^2 + n)/2 (th triangular number)

I'm using Athena and trying to find a way to create a select statement that will return a sequence in the below format: Numer 1 2 2 3 3 3 4 4 4 4 And so on, up to 200. Is it even possible?
Byrdziu
  • 107
  • 1
  • 9
1
vote
1 answer

HIVE_INVALID_METADATA: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 43 elements while columns.types has 34 elements

I'm working on a client platform. It is a DataLake linked to AWS S3 and AWS ATHENA I've uploaded a Dataset to an S3 Bucket using AWS GLUE. The job ran successfully and a table was created under ATHENA. When I try to "Preview" the content of the…
1
vote
2 answers

Connect to AWS Athena with JDBC driver when AWS access is given through (Azure) OIDC provider

We connect to AWS through our office365 users given to us by our company. Since we assume a role in the UI, how would I connect to Athena from a JDBC tool like DataGrip?
1
vote
1 answer

Redshift Spectrum much slower than Athena?

Our data is stored in S3 as JSON without partitions. Until today we were using only athena but now we tried Redshift Spectrum. We are running the same query twice. Once using Redshift Spectrum and once using Athena. Both connect to the same data in…
AJ222
  • 1,114
  • 2
  • 18
  • 40
1
vote
1 answer

SQL query to get week number from date while setting week starting from Sunday

I am trying to run a query in AWS Athena to get the week number from a date. I need the week to start on Sunday. I have tried the below query and it works but it considers the week starts from Monday. Extract(week from date) as week_number Google…
man utd
  • 37
  • 1
  • 5
1
vote
1 answer

Athena: is datatype `struct` equivalent to `map`?

My dataflow is: json -> parquet -> Athena, but I'm having a problem with the nested key-value field tags. The JSON file is [{"myid":1,"name":"foo","tags":{"tag1":"a","tag2":"b"}}, {"myid":2,"name":"bar","tags":{"tag1":"c","tag2":"d"}} ] The Athena…
Shadi
  • 9,742
  • 4
  • 43
  • 65
1
vote
1 answer

Is there a way to pull data-range in AWS Athena when the timestamp is big int using mySQL & PrestoDB?

I am trying to query from a table where I am particularly interested to pull the last 7 days from the timestamp column. The timestamp column has been stored in the table as big int. I tried using this after converting the 'big int' timestamp to…
Raj
  • 93
  • 1
  • 11
1
vote
2 answers

How to skip documents that do not match schema in Athena?

Suppose I have an external table like this: CREATE EXTERNAL TABLE my.data ( `id` string, `timestamp` string, `profile` struct< `name`: string, `score`: int> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES…
sdgfsdh
  • 33,689
  • 26
  • 132
  • 245
1
vote
2 answers

SQL partially order results

I have a table in Amazon Athena (a Presto engine) of some user events that among others have userId (VARCHAR) and ts (timestamp, BIGINT) column. The table is quite big, some hundreds of millions of records. I would like to create a query that orders…
MrTJ
  • 13,064
  • 4
  • 41
  • 63
1
vote
1 answer

How do I identify problematic documents in S3 when querying data in Athena?

I have a basic Athena query like this: SELECT * FROM my.dataset LIMIT 10 When I try to run it I get an error message like this: Your query has the following error(s): HIVE_BAD_DATA: Error parsing field value for field 2: For input string:…
sdgfsdh
  • 33,689
  • 26
  • 132
  • 245
1
vote
2 answers

SQL to get next monday in Presto

I need to get the date for next Monday irrespective of which day the sql is executed. I think the logic 7 - day_of_week(current_date)%7 + 1 would work, but then how to get the date. select current_date; _col0 1 2019-11-16 select (7 -…
Anand
  • 145
  • 1
  • 3
  • 10
1
vote
1 answer

Force athena/presto to respect ordering of files from S3

I have a file where the ordering is important. Essentially the file contains multiple record types, and first record of the group defines the ID that links subsequent records, until another "first" record is found. There can be any number of…
Codek
  • 5,114
  • 3
  • 24
  • 38
1
vote
1 answer

How to connect Athena with Power BI using ODBC

How to connect AWS Athena with Power BI using ODBC. I have installed 64 bit Simba Athena Connection. I have added appropriate bucket policy. { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", …
Srihari Karanth
  • 2,067
  • 2
  • 24
  • 34
1
vote
1 answer

Best way to save pandas DataFrame to parquet with date type

I'm trying to save DataFrame with date type column to a parquet format to be used later in Athena. As far as I understand parquet has native DATE type, by the only type I can really use is datetime64[ns] with pyarrow engine (here is the same issue …
kismsu
  • 1,049
  • 7
  • 22