0

Preface: There may not be a solution to this one, but I wanted to create a question here so others can find it later if they run into the same thing.

I have an application which needs to return potentially arbitrarily nested JSON to clients.

Thanks to a lot of help, I managed to get a working query for nesting JSON up to a depth of 3 levels.

The following example query (users -> todo_lists -> todos) works on all of Trino/Presto and Athena:

-- sample data
with users (user_id, name) as (values (1, 'Alice'),
                                  (2, 'Bob'),
                                  (3, 'Charlie')),
 todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
                                                      (2, 1, 'todo list 2'),
                                                      (3, 2, 'todo list 3'),
                                                      (4, 3, 'todo list 4')),
 todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
                                                 (2, 1, 'todo 2'),
                                                 (3, 2, 'todo 3'),
                                                 (4, 3, 'todo 4'))
-- query
select *
from (select cast(array_agg(
    map(array['user_id', 'name', 'todo_lists'],
        array[user_id, name, cast(todo_lists as json) ]))
             as json)
  from (select cast(u.user_id as json)   user_id,
               cast(max(u.name) as json) name,
               cast(array_agg(
                       map(array['todo_list_id', 'title', 'todos'],
                           array[cast(tl.todo_list_id as json),
                           cast(tl.title as json),
                           cast(
                                   (select array_agg(
                                                   map(array['todo_id', 'title'],
                                                       array[cast(t.todo_id as json),
                                                       cast(t.title as json) ]))
                                    from todos t
                                    where t.todo_list_id = tl.todo_list_id)
                               as json) ]))
                   as json)              todo_lists
        from users u
                 join todo_lists tl on tl.user_id = u.user_id
        group by u.user_id) t) t;

-- results
-- [{"name":"Alice","todo_lists":[{"title":"todo list 2","todo_list_id":2,"todos":[{"title":"todo 3","todo_id":3}]},{"title":"todo list 1","todo_list_id":1,"todos":[{"title":"todo 1","todo_id":1},{"title":"todo 2","todo_id":2}]}],"user_id":1},{"name":"Charlie","todo_lists":[{"title":"todo list 4","todo_list_id":4,"todos":[null]}],"user_id":3},{"name":"Bob","todo_lists":[{"title":"todo list 3","todo_list_id":3,"todos":[{"title":"todo 4","todo_id":4}]}],"user_id":2}]

Now, if we try to add a 4th nesting depth to the query, we get a failure on every engine:

-- sample data
with users (user_id, name) as (values (1, 'Alice'),
                                  (2, 'Bob'),
                                  (3, 'Charlie')),
 todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
                                                      (2, 1, 'todo list 2'),
                                                      (3, 2, 'todo list 3'),
                                                      (4, 3, 'todo list 4')),
 todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
                                                 (2, 1, 'todo 2'),
                                                 (3, 2, 'todo 3'),
                                                 (4, 3, 'todo 4')),
 todo_items (todo_item_id, todo_id, title) as (values (1, 1, 'todo item 1'),
                                                      (2, 1, 'todo item 2'),
                                                      (3, 2, 'todo item 3'),
                                                      (4, 2, 'todo item 4'),
                                                      (5, 3, 'todo item 5'),
                                                      (6, 3, 'todo item 6'),
                                                      (7, 4, 'todo item 7'),
                                                      (8, 4, 'todo item 8'))
-- query
select cast(array_agg(
    map(array['user_id', 'name', 'todo_lists'],
        array[user_id, name, cast(todo_lists as json) ]))
       as json)
from (select cast(user_id as json)   user_id,
         cast(name as json) name,
         cast(todo_lists as json)   todo_lists
    from (select cast(u.user_id as json)   user_id,
                 cast(max(u.name) as json) name,
                 cast(array_agg(
                         map(array['todo_list_id', 'title', 'todos'],
                             array[cast(tl.todo_list_id as json),
                             cast(tl.title as json),
                             cast(
                                     (select array_agg(
                                                     map(array['todo_id', 'title', 'todo_items'],
                                                         array[cast(t.todo_id as json),
                                                         cast(t.title as json),
                                                         cast(
                                                                 (select array_agg(
                                                                                 map(array['todo_item_id', 'title'],
                                                                                     array[cast(ti.todo_item_id as json),
                                                                                     cast(ti.title as json) ]))
                                                                  from todo_items ti
                                                                  where ti.todo_id = t.todo_id)
                                                             as json) ]))
                                      from todos t
                                      where t.todo_list_id = tl.todo_list_id)
                                 as json) ]))
                     as json)              todo_lists
          from users u
                   join todo_lists tl on tl.user_id = u.user_id
          group by u.user_id) t
    ) t;
  • Trino v371 trino query failure

  • Athena v2 (Presto v0.217) athena query failure

I found similar questions, but none with a definitive answer:

Gavin Ray
  • 595
  • 1
  • 3
  • 10
  • Do you have some source JSON with nested JSON more than 3 levels or you just trying to generate the JSON with select statement? If you want to parse nested JSON update your question with sample JSON records and expected output. If not explain a bit more about your use case on why you want to generate JSON with select statement. – Prabhakar Reddy Sep 20 '22 at 01:01
  • The rows in the table will be flat, and the server must return these flat rows nested into JSON based on relationships to the client. The server application does not have a specific schema, but lets you query data that already exists (so it is generated on-the-fly). So, it's not possible to know how many levels of nesting a query might have from a client and their dataset unfortunately =( – Gavin Ray Sep 20 '22 at 01:27

1 Answers1

0

To update this:

  • The creator of Presto/Trino said that this behavior is not intentional and is a bug in the engine:

enter image description here

The issue is here: https://github.com/trinodb/trino/issues/14199

There is an alternate way of writing queries that doesn't require correlation I have attached in the issues thread as a reply.

Gavin Ray
  • 595
  • 1
  • 3
  • 10