Questions tagged [amazon-athena]

Amazon Athena is a service for running SQL queries against data stored on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Amazon Athena is a service for running SQL queries against data stored in files on Amazon S3. Amazon Athena is part of Amazon Web Services (AWS).

Athena is powered by the Presto query engine and uses Apache Hive Metastore for database and table definitions. It supports both dynamic and static partitions for tables. Athena supports data stored in delimited text files, JSON, ORC, Avro, and Parquet.

Athena is a serverless tool - there is no infrastructure to manage, and cost is calculated by the quantity of data scanned during each query.

See the Athena Documentation for more.

3440 questions
12
votes
9 answers

AWS Glue Crawler Cannot Extract CSV Headers

At my wits end here... I have 15 csv files that I am generating from a beeline query like: beeline -u CONN_STR --outputformat=dsv -e "SELECT ... " > data.csv I chose dsv because some string fields include commas and they are not quoted, which…
Mac
  • 1,143
  • 6
  • 21
  • 45
12
votes
6 answers

Pandas dataframe type datetime64[ns] is not working in Hive/Athena

I am working on a python application which just converts csv file to hive/athena compatible parquet format and I am using fastparquet and pandas libraries to perform this. There are timestamp values in csv file like 2018-12-21 23:45:00 which needs…
prasannads
  • 609
  • 2
  • 14
  • 28
12
votes
2 answers

Athena unable to parse date using OpenCSVSerde

I have a very simple csv file on S3 "i","d","f","s" "1","2018-01-01","1.001","something great!" "2","2018-01-02","2.002","something terrible!" "3","2018-01-03","3.003","I'm an oil man" I'm trying to create a table across this using the following…
Kirk Broadhurst
  • 27,836
  • 16
  • 104
  • 169
12
votes
3 answers

Athena date_parse for date with optional millisecond field

I have date in S3 using which I created an Athena table. I have some date entries in S3 in json format which Athena is not accepting as either Date or timestamp when am running the queries. Using AWS Athena which uses Prestodb as query…
Scorpion
  • 633
  • 3
  • 11
  • 24
12
votes
2 answers

AWS Glue: crawler misinterprets timestamps as strings. GLUE ETL meant to convert strings to timestamps makes them NULL

I have been playing around with AWS Glue for some quick analytics by following the tutorial here While I have been able to successfully create crawlers and discover data in Athena, I've had issues with the data types created by the crawler. The…
12
votes
2 answers

Can't Query Athena Table Because of Dash Character

I have a table called fpa-dev in Athena (created by Glue). When I run this simple query: SELECT * FROM fpa-dev LIMIT 10 it gives me this error: extraneous input '-' expecting {, '.', ',', 'add', 'as', 'all', 'some', 'any', 'where', 'group',…
Spencer Sutton
  • 2,907
  • 2
  • 18
  • 19
12
votes
5 answers

AWS Athena (Presto) OFFSET support

I would like to know if there is support for OFFSET in AWS Athena. For mysql the following query is running but in athena it is giving me error. Any example would be helpful. select * from employee where empSal >3000 LIMIT 300 OFFSET 20
NinjaDev
  • 301
  • 1
  • 7
  • 19
12
votes
3 answers

SHOW PARTITIONS with order by in Amazon Athena

I have this query: SHOW PARTITIONS tablename; Result is: dt=2018-01-12 dt=2018-01-20 dt=2018-05-21 dt=2018-04-07 dt=2018-01-03 This gives the list of partitions per table. The partition field for this table is dt which is a date column. I want to…
jack
  • 821
  • 5
  • 16
  • 28
12
votes
1 answer

Sanitizing SQL query parameters in AWS Athena

In AWS Athena, there doesn't appear to be a good way to pass parameters into the SQL query. The typical way to execute is from a boto3 call, response = client.start_query_execution( QueryString='string', ClientRequestToken='string', …
user3002273
12
votes
1 answer

How to handle embed line breaks in AWS Athena

I have created a table in AWS Athena like this: CREATE EXTERNAL TABLE IF NOT EXISTS default.test_line_breaks ( col1 string, col2 string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'separatorChar' =…
antklim
  • 381
  • 3
  • 7
12
votes
3 answers

Amazon AWS Athena S3 and Glacier Mixed Bucket

Amazon Athena Log Analysis Services with S3 Glacier We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena…
11
votes
2 answers

Quicksight Athena - [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client

I am getting below error when i am trying to create New Dataset in Quicksight from Athena. I tried for "Table" and "view". It gives below error : [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena…
Bokambo
  • 4,204
  • 27
  • 79
  • 130
11
votes
3 answers

AWS Athena - duplicate columns due to partitionning

We have a glue crawler that read avro files in S3 and create a table in glue catalog accordingly. The thing is that we have a column named 'foo' that came from the avro schema and we also have something like 'foo=XXXX' in the s3 bucket path, to have…
Yannick
  • 1,240
  • 2
  • 13
  • 25
11
votes
2 answers

AWS Athena create table and partition

I stored my sensor data in S3 (write data every 5 minutes): farm_iot/sensor_data/farm/farm0001/sensor01/1541252701443 1541252701443 is a json file containing measurements: { "temperature": 14.78, "pressure": 961.70, "humidity": 68.32} I am…
moin moin
  • 2,263
  • 5
  • 32
  • 51
11
votes
1 answer

Spark dataframe column naming conventions / restrictions

I have run into issues with the default naming (as imported from .csv files received) of my (Py)Spark column names multiple times now. Things that seem to mess with Spark are MixedCase and things like . or - in the column names. So I decided to find…
Thomas
  • 4,696
  • 5
  • 36
  • 71