Questions tagged [amazon-athena]

Amazon Athena is a service for running SQL queries against data stored on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Amazon Athena is a service for running SQL queries against data stored in files on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Athena is powered by the Presto query engine and uses Apache Hive Metastore for database and table definitions. It supports both dynamic and static partitions for tables. Athena supports data stored in delimited text files, JSON, ORC, Avro, and Parquet.

Athena is a serverless tool - there is no infrastructure to manage, and cost is calculated by the quantity of data scanned during each query.

See the Athena Documentation for more.

3440 questions

votes

9 answers

AWS Glue Crawler Cannot Extract CSV Headers

At my wits end here... I have 15 csv files that I am generating from a beeline query like: beeline -u CONN_STR --outputformat=dsv -e "SELECT ... " > data.csv I chose dsv because some string fields include commas and they are not quoted, which…

csv amazon-athena aws-glue

asked Jan 25 '19 at 21:57

Mac

1,143
6
21
45

votes

6 answers

Pandas dataframe type datetime64[ns] is not working in Hive/Athena

I am working on a python application which just converts csv file to hive/athena compatible parquet format and I am using fastparquet and pandas libraries to perform this. There are timestamp values in csv file like 2018-12-21 23:45:00 which needs…

python pandas hive amazon-athena fastparquet

asked Dec 25 '18 at 06:06

prasannads

votes

2 answers

Athena unable to parse date using OpenCSVSerde

I have a very simple csv file on S3 "i","d","f","s" "1","2018-01-01","1.001","something great!" "2","2018-01-02","2.002","something terrible!" "3","2018-01-03","3.003","I'm an oil man" I'm trying to create a table across this using the following…

csv hive opencsv amazon-athena presto

asked Sep 29 '18 at 01:02

Kirk Broadhurst

27,836
16
104
169

votes

3 answers

Athena date_parse for date with optional millisecond field

I have date in S3 using which I created an Athena table. I have some date entries in S3 in json format which Athena is not accepting as either Date or timestamp when am running the queries. Using AWS Athena which uses Prestodb as query…

sql amazon-web-services amazon-athena presto

asked Sep 12 '18 at 21:57

Scorpion

votes

2 answers

AWS Glue: crawler misinterprets timestamps as strings. GLUE ETL meant to convert strings to timestamps makes them NULL

I have been playing around with AWS Glue for some quick analytics by following the tutorial here While I have been able to successfully create crawlers and discover data in Athena, I've had issues with the data types created by the crawler. The…

amazon-web-services amazon-s3 amazon-athena aws-glue

asked Aug 27 '18 at 03:51

Balajee Addanki

votes

2 answers

Can't Query Athena Table Because of Dash Character

I have a table called fpa-dev in Athena (created by Glue). When I run this simple query: SELECT * FROM fpa-dev LIMIT 10 it gives me this error: extraneous input '-' expecting {, '.', ',', 'add', 'as', 'all', 'some', 'any', 'where', 'group',…

amazon-web-services amazon-athena

asked Jul 31 '18 at 00:42

Spencer Sutton

2,907
2
18
19

votes

5 answers

AWS Athena (Presto) OFFSET support

I would like to know if there is support for OFFSET in AWS Athena. For mysql the following query is running but in athena it is giving me error. Any example would be helpful. select * from employee where empSal >3000 LIMIT 300 OFFSET 20

sql amazon-web-services presto amazon-athena

asked Jul 12 '18 at 06:26

NinjaDev

votes

3 answers

SHOW PARTITIONS with order by in Amazon Athena

I have this query: SHOW PARTITIONS tablename; Result is: dt=2018-01-12 dt=2018-01-20 dt=2018-05-21 dt=2018-04-07 dt=2018-01-03 This gives the list of partitions per table. The partition field for this table is dt which is a date column. I want to…

amazon-athena

asked Jun 07 '18 at 08:02

jack

votes

1 answer

Sanitizing SQL query parameters in AWS Athena

In AWS Athena, there doesn't appear to be a good way to pass parameters into the SQL query. The typical way to execute is from a boto3 call, response = client.start_query_execution( QueryString='string', ClientRequestToken='string', …

sql-injection amazon-athena

asked Feb 14 '18 at 19:10

user3002273

votes

1 answer

How to handle embed line breaks in AWS Athena

I have created a table in AWS Athena like this: CREATE EXTERNAL TABLE IF NOT EXISTS default.test_line_breaks ( col1 string, col2 string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'separatorChar' =…

amazon-web-services amazon-athena

asked Dec 22 '17 at 05:19

antklim

votes

3 answers

Amazon AWS Athena S3 and Glacier Mixed Bucket

Amazon Athena Log Analysis Services with S3 Glacier We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena…

amazon-web-services amazon-s3 amazon-glacier amazon-athena

asked Jan 25 '17 at 22:33

Stephen Blum

6,498
2
34
46

votes

2 answers

Quicksight Athena - [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client

I am getting below error when i am trying to create New Dataset in Quicksight from Athena. I tried for "Table" and "view". It gives below error : [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena…

amazon-s3 amazon-athena amazon-quicksight

asked Nov 12 '21 at 20:47

Bokambo

4,204
27
79
130

votes

3 answers

AWS Athena - duplicate columns due to partitionning

We have a glue crawler that read avro files in S3 and create a table in glue catalog accordingly. The thing is that we have a column named 'foo' that came from the avro schema and we also have something like 'foo=XXXX' in the s3 bucket path, to have…

amazon-web-services amazon-s3 avro aws-glue amazon-athena

asked Dec 10 '19 at 13:47

Yannick

1,240
2
13
25

votes

2 answers

AWS Athena create table and partition

I stored my sensor data in S3 (write data every 5 minutes): farm_iot/sensor_data/farm/farm0001/sensor01/1541252701443 1541252701443 is a json file containing measurements: { "temperature": 14.78, "pressure": 961.70, "humidity": 68.32} I am…

amazon-web-services amazon-s3 hive amazon-athena

asked Dec 01 '18 at 11:47

moin moin

2,263
5
32
51

votes

1 answer

Spark dataframe column naming conventions / restrictions

I have run into issues with the default naming (as imported from .csv files received) of my (Py)Spark column names multiple times now. Things that seem to mess with Spark are MixedCase and things like . or - in the column names. So I decided to find…

apache-spark hive pyspark naming-conventions amazon-athena

asked Oct 26 '18 at 14:05

Thomas

4,696
5
36
71

Prev 1 2 3

…

99 100 Next