9

I am getting the below error while querying the table created in Amazon Athena.

Error

HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2

The sample file which I am using and the query to create table is below. The table gets successfully created by the below query but when I am fetching the result from table I am getting the error. Please provide your valuable suggestion.

Note Sample Data Create table

abc
  • 117
  • 1
  • 1
  • 7

2 Answers2

16

AWS Athena does not support multi-line JSON.

Athena knowledge center

Make sure your JSON record is on a single line

Athena doesn't currently support multi-line JSON records.

jens walter
  • 13,269
  • 2
  • 56
  • 54
  • 2
    Thanks jens, but I amazon website it is depicted to cater multiline json file and also the table creation syntax is mentioned for the same. I have attached the both in my original question. – abc Jun 20 '17 at 08:13
2

I'm abusing the answer field to have more space and to have my thoughts on this a little bit structured. I hope this is useful input to anyone using Athena.

I'm using Athena to create two tables. Single-line-json-based and multi-line-json-based reports in two separate bucket folders and two according tables.

Single-line reports in JSON are straightforward. For table creation I use this driver:

ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe'

However if you use this exact one to consume multi-line you will get an error PLUS there are three more things (I, II and III) to be aware of. First of all the different driver:

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

then

I the (invalid) JSON format. My multi-line files look like this:

{"key": "value"}
{"key": "value"}

Notice there is one JSON object per line and objects are NOT comma separated! You can read about this here:

Store multiple elements in json files in AWS Athena

II Create Statement

Contrary to your screenshot for table creation @abc I used it without nested structs:

CREATE EXTERNAL TABLE IF NOT EXISTS internal_reports (
`Date` Date,
impressions INT,
reach INT,
follower INT,
...
)

III Date property

I came across this whole driver-experimenting thing because the Date property caused issues in my single-line table because my values for Date had time information not equal to midnight. For the other multi-line reports my Dates were ok (the time was exactly midnight). You can read about this here:

https://developer.ibm.com/answers/questions/177238/timestamp-format-must-be-yyyy-mm-dd-hhmmssffffffff/ Athena - DATE column correct values from JSON

Cheers

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
p0rter
  • 961
  • 2
  • 13
  • 28