Preface: There may not be a solution to this one, but I wanted to create a question here so others can find it later if they run into the same thing.
I have an application which needs to return potentially arbitrarily nested JSON to clients.
Thanks to a lot of help, I managed to get a working query for nesting JSON up to a depth of 3 levels.
- Athena/Presto: SQL for producing array of nested maps/structs from flat rows
- Presto Build JSON Array with Different Data Types
The following example query (users -> todo_lists -> todos
) works on all of Trino/Presto and Athena:
-- sample data
with users (user_id, name) as (values (1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie')),
todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
(2, 1, 'todo list 2'),
(3, 2, 'todo list 3'),
(4, 3, 'todo list 4')),
todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
(2, 1, 'todo 2'),
(3, 2, 'todo 3'),
(4, 3, 'todo 4'))
-- query
select *
from (select cast(array_agg(
map(array['user_id', 'name', 'todo_lists'],
array[user_id, name, cast(todo_lists as json) ]))
as json)
from (select cast(u.user_id as json) user_id,
cast(max(u.name) as json) name,
cast(array_agg(
map(array['todo_list_id', 'title', 'todos'],
array[cast(tl.todo_list_id as json),
cast(tl.title as json),
cast(
(select array_agg(
map(array['todo_id', 'title'],
array[cast(t.todo_id as json),
cast(t.title as json) ]))
from todos t
where t.todo_list_id = tl.todo_list_id)
as json) ]))
as json) todo_lists
from users u
join todo_lists tl on tl.user_id = u.user_id
group by u.user_id) t) t;
-- results
-- [{"name":"Alice","todo_lists":[{"title":"todo list 2","todo_list_id":2,"todos":[{"title":"todo 3","todo_id":3}]},{"title":"todo list 1","todo_list_id":1,"todos":[{"title":"todo 1","todo_id":1},{"title":"todo 2","todo_id":2}]}],"user_id":1},{"name":"Charlie","todo_lists":[{"title":"todo list 4","todo_list_id":4,"todos":[null]}],"user_id":3},{"name":"Bob","todo_lists":[{"title":"todo list 3","todo_list_id":3,"todos":[{"title":"todo 4","todo_id":4}]}],"user_id":2}]
Now, if we try to add a 4th nesting depth to the query, we get a failure on every engine:
-- sample data
with users (user_id, name) as (values (1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie')),
todo_lists (todo_list_id, user_id, title) as (values (1, 1, 'todo list 1'),
(2, 1, 'todo list 2'),
(3, 2, 'todo list 3'),
(4, 3, 'todo list 4')),
todos (todo_id, todo_list_id, title) as (values (1, 1, 'todo 1'),
(2, 1, 'todo 2'),
(3, 2, 'todo 3'),
(4, 3, 'todo 4')),
todo_items (todo_item_id, todo_id, title) as (values (1, 1, 'todo item 1'),
(2, 1, 'todo item 2'),
(3, 2, 'todo item 3'),
(4, 2, 'todo item 4'),
(5, 3, 'todo item 5'),
(6, 3, 'todo item 6'),
(7, 4, 'todo item 7'),
(8, 4, 'todo item 8'))
-- query
select cast(array_agg(
map(array['user_id', 'name', 'todo_lists'],
array[user_id, name, cast(todo_lists as json) ]))
as json)
from (select cast(user_id as json) user_id,
cast(name as json) name,
cast(todo_lists as json) todo_lists
from (select cast(u.user_id as json) user_id,
cast(max(u.name) as json) name,
cast(array_agg(
map(array['todo_list_id', 'title', 'todos'],
array[cast(tl.todo_list_id as json),
cast(tl.title as json),
cast(
(select array_agg(
map(array['todo_id', 'title', 'todo_items'],
array[cast(t.todo_id as json),
cast(t.title as json),
cast(
(select array_agg(
map(array['todo_item_id', 'title'],
array[cast(ti.todo_item_id as json),
cast(ti.title as json) ]))
from todo_items ti
where ti.todo_id = t.todo_id)
as json) ]))
from todos t
where t.todo_list_id = tl.todo_list_id)
as json) ]))
as json) todo_lists
from users u
join todo_lists tl on tl.user_id = u.user_id
group by u.user_id) t
) t;
I found similar questions, but none with a definitive answer:
- SQL Presto: correlated subquery is not supported
- Correlated subquery pattern is not supported due to internal error - where not exists correlated subquery
- (I don't think using
JOIN
is feasible in the subqueries constructing the objects here but I may be wrong)
- (I don't think using