6

I create a table in Athena with below structure

CREATE EXTERNAL TABLE s3_json_objects (
    devId string,
    type string,
    status string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://mybucket/folder1/data/athena_test/';

S3 bucket objects contains JSON structure like this

{
    "devId": "00abcdef1122334401",
    "type": "lora",
    "status": "huihuhukiyg"
}

However below SQL working correctly and return the correct result for only count

SELECT count(*) as total_s3_objects FROM "athena_db"."s3_json_objects"

BUT whenever I query below SQL select statement to fetch the JSON values from S3, It's returns result sets with empty values for columns

SELECT devid FROM "athena_db"."s3_json_objects"
SELECT json_extract(devid , '$.devid') as Id FROM "athena_db"."s3_json_objects"
SELECT * FROM "athena_db"."s3_json_objects"

enter image description here

Also, I review these links before post this question on StackOverflow and AWS Athena doc

Can't read json file via Amazon Athena

AWS Athena json_extract query from string field returns empty values

Any comments or suggestions would be much appreciated.

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
SamDev
  • 185
  • 2
  • 11

3 Answers3

3

The JSON must be in a single line, as mentioned in this page of the AWS Athena documentation. You can have multiple JSON objects on separate lines, but each complete object must only span one line.

Example (this could all be in one S3 object):

{"devId": "a1", "type": "b1", "status": "c1"}
{"devId": "a2", "type": "b2", "status": "c2"}
Tom
  • 1,660
  • 8
  • 16
  • My objects span only one line, but it didn't help. If I make query "select * from xxx" if shows messages like this: | 1 |foo | {"int_field":"1","str_field":"hello"} |, and if I make json_extract it shows only numbers. I need to make json, because all numbers are in quotes if make simple select – Sergius Jan 28 '19 at 13:31
  • @Sergius Also, you can't have spaces between your values. Get rid of all the spaces. – h-rai Apr 09 '19 at 21:26
2

Glue can read multi-line json objects because it has spark engine under the hood. One workaround is, transform those json objects to parquet using glue if you can't easily make those json objects on line.

Tanveer Uddin
  • 1,520
  • 9
  • 15
0

Use jsonlines to convert JSON to jsonlines and then Athena will be able to fetch all row.

Somabho
  • 111
  • 1
  • 5