0

I have a file in S3, and Presto running on EMR. I see I can use Json_extract to read the json.

I am running the following query, however, I keep seeing null instead of the correct value.

select json_extract('s3a://random-s3-bucket/analytics/20210221/myjsonfile.json', '$.dateAvailability')

I see this output

enter image description here

Not sure if my syntax is wrong? Thoughts?

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82

2 Answers2

1

json_extract() operates on JSON scalar values kept in memory. It does not load data from an external location. See documentation page for usage examples.

In order to query a JSON file using Trino (formerly known as Presto SQL), you need to map it as a table with JSON format like this:

CREATE TABLE my_table ( .... )
WITH (
    format = 'JSON',
    external_location = 's3a://random-s3-bucket/analytics/20210221'
);

See more information in Hive connector documentation.

Piotr Findeisen
  • 19,480
  • 2
  • 52
  • 82
1

If you need a tool to help you create the table statement, try this one: https://www.hivetablegenerator.com

From the page:

Easily convert any JSON (even complex Nested ones), CSV, TSV, or Log sample file to an Apache HiveQL DDL create table statement.

brngyn
  • 96
  • 6