I'm abusing the answer field to have more space and to have my thoughts on this a little bit structured. I hope this is useful input to anyone using Athena.
I'm using Athena to create two tables. Single-line-json-based and multi-line-json-based reports in two separate bucket folders and two according tables.
Single-line reports in JSON are straightforward. For table creation I use this driver:
ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe'
However if you use this exact one to consume multi-line you will get an error PLUS there are three more things (I, II and III) to be aware of. First of all the different driver:
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
then
I the (invalid) JSON format. My multi-line files look like this:
{"key": "value"}
{"key": "value"}
Notice there is one JSON object per line and objects are NOT comma separated!
You can read about this here:
Store multiple elements in json files in AWS Athena
II Create Statement
Contrary to your screenshot for table creation @abc I used it without nested structs:
CREATE EXTERNAL TABLE IF NOT EXISTS internal_reports (
`Date` Date,
impressions INT,
reach INT,
follower INT,
...
)
III Date property
I came across this whole driver-experimenting thing because the Date property caused issues in my single-line table because my values for Date had time information not equal to midnight. For the other multi-line reports my Dates were ok (the time was exactly midnight). You can read about this here:
https://developer.ibm.com/answers/questions/177238/timestamp-format-must-be-yyyy-mm-dd-hhmmssffffffff/
Athena - DATE column correct values from JSON
Cheers