19

I'm using the latest(0.117) Presto and trying to execute CROSS JOIN UNNEST with complex JSON array like this.

[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}, ...]

To do that, first I tried to make an ARRAY with the values of id by

SELECT CAST(JSON_EXTRACT('[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}]', '$..id') AS ARRAY<BIGINT>)

but it doesn't work.

What is the best JSON Path to extract the values of id?

Damien Carol
  • 1,164
  • 1
  • 11
  • 21
k-kawa
  • 1,289
  • 2
  • 11
  • 18

4 Answers4

19

This will solve your problem. It is more generic cast to an ARRAY of json (less prone to errors given an arbitrary map structure):

select 
        TRANSFORM(CAST(JSON_PARSE(arr1) AS ARRAY<JSON>), 
                   x -> JSON_EXTRACT_SCALAR(x, '$.id'))
from 
       (values ('[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}]')) t(arr1)

Output in presto:

 [1,2]

... I ran into a situation where a list of jsons was nested within a json. My list of jsons had an ambiguous nested map structure. The following code returns an array of values given a specific key in a list of jsons.

  1. Extract the list using JSON EXTRACT
  2. Cast the list as an array of jsons
  3. Loop through the json elements in the array using the TRANSFORM function and extract the value of the key that you are interested in.

>

TRANSFORM(CAST(JSON_EXTRACT(json, '$.path.toListOfJSONs') AS ARRAY<JSON>),
          x -> JSON_EXTRACT_SCALAR(x, '$.id')) as id
Attila Dobi
  • 191
  • 1
  • 2
14

You can cast the JSON into an ARRAY of MAP, and use transform lambda function to extract the "id" key:

select 
        TRANSFORM(CAST(JSON_PARSE(arr1) AS ARRAY<MAP<VARCHAR, VARCHAR>>), entry->entry['id'])
from 
       (values ('[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}]')) t(arr1)

output:

 [1, 2]
FreePeter
  • 752
  • 6
  • 6
  • thanks a lot! I had no idea on `lambda` https://prestodb.io/docs/current/functions/lambda.html – k-kawa Jun 02 '17 at 04:53
5

Now, you can use presto-third-functions , It provide json_array_extract function, you can extract json array info like this:

    select 
           json_array_extract_scalar(arr1, '$.book.id') 
    from 
           (values ('[{"book":{"id":"12"}}, {"book":{"id":"14"}}]')) t(arr1)

output is:

    [12, 14]
aaronshan
  • 382
  • 3
  • 9
  • Here's the readme in English https://github.com/aaronshan/presto-third-functions/blob/master/README-en.md – Pierre L Apr 01 '17 at 14:42
2

I finally gave up finding a simple JSON Path to extract them.

Instead, I wrote a redundant dirty query like the following to make the task done.

SELECT
  ...
FROM
  (
    SELECT
      SLICE(ARRAY[
        JSON_EXTRACT(json_column, '$[0].id'),
        JSON_EXTRACT(json_column, '$[1].id'),
        JSON_EXTRACT(json_column, '$[2].id'),
        ...
      ], JSON_ARRAY_LENGTH(json_column)) ids
    FROM
     the.table
  ) t1
  CROSS JOIN UNNEST(ids) AS t2(id)
WHERE
  ...

I still want to know the best practice if you know another good way to CROSS JOIN them!

k-kawa
  • 1,289
  • 2
  • 11
  • 18