16

I am currently evaluating Amazon Athena and Amazon S3. I have created a database (testdb) with one table (awsevaluationtable). The table has two columns, x (bigint) and y (bigint).

When I run:

SELECT * 
FROM testdb."awsevaluationtable"

I get all of the test data: Successful Query

However, when I try a basic WHERE query:

SELECT * 
FROM testdb."awsevaluationtable" 
WHERE x > 5

I get:

SYNTAX_ERROR: line 3:7: Column 'x' cannot be resolved

I have tried all sorts of variations:

SELECT * FROM testdb.awsevaluationtable WHERE x > 5
SELECT * FROM awsevaluationtable WHERE x > 5
SELECT * FROM testdb."awsevaluationtable" WHERE X > 5
SELECT * FROM testdb."awsevaluationtable" WHERE testdb."awsevaluationtable".x > 5
SELECT * FROM testdb.awsevaluationtable WHERE awsevaluationtable.x > 5

I have also confirmed that the x column exists with:

SHOW COLUMNS IN sctawsevaluation

Column query

This seems like an extremely simple query yet I can't figure out what is wrong. I don't see anything obvious in the documentation. Any suggestions would be appreciated.

Joel
  • 399
  • 1
  • 4
  • 16
  • 2
    Did you try it with different name for the columns? – SezMe Aug 22 '18 at 19:53
  • 1
    This lead me to figuring it out, thanks! – Joel Aug 22 '18 at 21:11
  • 10
    In my case I was using `column_name="val"` when i should have been trying to use `column_name='val'` ... single quotes on the value – jmunsch Jun 19 '19 at 01:43
  • 3
    Single quotes also did it for me. Seems crazy that the syntax is so strict. – snakesNbronies Dec 10 '19 at 05:44
  • 1
    Description with some more details from AWS can be found here: https://aws.amazon.com/de/premiumsupport/knowledge-center/athena-syntax-error-column-id/ – supernova Feb 09 '20 at 23:44
  • Remember that only one Athena SQL query (Trino SQL) can be run at a time; this may or may not be an issue for the OP, but it could affect people reading this. Hopefully, AWS will correct this "bug" at some point! – JosephDoggie Oct 18 '21 at 16:51

3 Answers3

40

In my case, changing double quotes to single quotes resolves this error.

Presto uses single quotes for string literals, and uses double quotes for identifiers.

https://trino.io/docs/current/migration/from-hive.html#use-ansi-sql-syntax-for-identifiers-and-strings

Strings are delimited with single quotes and identifiers are quoted with double quotes, not backquotes:

SELECT name AS "User Name"
FROM "7day_active"
WHERE name = 'foo'
Martin Traverso
  • 4,731
  • 15
  • 24
nekketsuuu
  • 1,641
  • 1
  • 21
  • 26
  • 3
    wow that actually worked for me as well with aws Athena – Kenan Jul 31 '20 at 19:33
  • This is a solution to a different problem, mistakenly using double quotes on a string literal. In this question there's no string literal. I hope no-one reads this answer and changes "mycolumnname" to 'mycolumnname' - that'd be a string literal, not a column value. – jedwidz Jul 20 '22 at 05:05
  • True, but I wrote an example query and explained what quotes should be used for what. Please read my answer. / History: First time I encountered this error message, I googled it, and opened this Q&A. After that I found the problem of my case, and wrote this answer. I noticed this question does not uses any string literals some day after posting the answer, but this answer got some upvotes then. So I let this answer leave as is. I welcome to edits to improve this Q&A while keeping it simple. It may be good to post a new question and migrating this answer to that question. – nekketsuuu Jul 21 '22 at 04:05
7

I have edited my response to this issue based on my current findings and my contact with both the AWS Glue and Athena support teams.

We were having the same issue - an inability to query on the first column in our CSV files. The problem comes down to the encoding of the CSV file. In short, AWS Glue and Athena currently do not support CSV's encoded in UTF-8-BOM. If you open up a CSV encoded with a Byte Order Mark (BOM) in Excel or Notepad++, it looks like any comma-delimited text file. However, opening it up in a Hex editor reveals the underlying issue. There are a bunch of special characters at the start of the file:  i.e. the BOM.

When a UTF-8-BOM CSV file is processed in AWS Glue, it retains these special characters, and associates then with the first column name. When you try and query on the first column within Athena, you will generate an error.

There are ways around this on AWS:

  • In AWS Glue, edit the table schema and delete the first column, then reinsert it back with the proper column name, OR

  • In AWS Athena, execute the SHOW CREATE TABLE DDL to script out the problematic table, remove the special character in the generated script, then run the script to create a new table which you can query on.

To make your life simple, just make sure your CSV's are encoded as UTF-8.

owl7
  • 308
  • 1
  • 3
  • 9
2

I noticed that the csv source of the original table had column headers with capital letters (X and Y) unlike the column names that were being displayed in Athena. So I removed the table, edited the csv file so that the headers were lowercase (x and y), then recreated the table and now it works!

Joel
  • 399
  • 1
  • 4
  • 16