1

I have two tables Orders and OrderItems. It's a common setup whereby OrderItems has a foreign key linking it to Orders. So we have a one-to-many join from Orders to OrderItems. Note: Tables would have many more fields in real life.

Orders         OrderItems
+---------+   +-------------+---------+
| orderId |   | orderItemId | orderId |
+---------+   +-------------+---------+
|       1 |   |           5 |       1 |
|       2 |   |           6 |       1 |
|         |   |           7 |       2 |
+---------+   +-------------+---------+

I'm using SQLAlchemy to reflect an existing database. So to query this data I do something like

ordersTable = db.Model.metadata.tables['Orders']
orderItemsTable = db.Model.metadata.tables['OrdersItems']
statement = ordersTable.join(orderItemsTable, ordersTable.c.orderId==orderItemsTable.c.orderId).select()
result = db.engine.execute(statement)
rlist = [dict(row) for row in result.fetchall()]

return flask.jsonify(rlist)

But the problem with this output is that I get duplicates of information from the Orders table due to the join. E.g. you can see that because orderId has two items I'll get everything in the Orders table twice.

What I'm after is a way to obtain a nested JSON output from the select query aboce. Such as:

[
    {
        "orderId": 1, 
        "orderItems": [
            { "orderItemId": 5 },
            { "orderItemId": 6 }
        ]
    },
    {
        "orderId": 2,
        "orderItems":[
            { "orderItemId": 7 }
        ]
    }
]

This question has been raised before How do I produce nested JSON from database query with joins? Using Python / SQLAlchemy I've spent quite a bit of time looking over the Marshmallow documentation, but I cannot find how to implement this using the type of query that I outlined above.

davidism
  • 121,510
  • 29
  • 395
  • 339
Anti-Distinctlyminty
  • 2,615
  • 5
  • 22
  • 24

1 Answers1

0

I didn't like how cluttered marshmallow is, so I wrote this. I also like that I can keep all of the data manipulation in the SQL statement instead of also instructing marshmallow what to do.

import json
from flask.json import JSONEncoder

def join_to_nested_dict(join_result):
    """
    Takes a sqlalchemy result and converts it to a dictionary.
    The models must use the dataclass decorator.
    Adds results to the right in a key named after the table the right item is contained in.
    :param List[Tuple[dataclass]] join_result:
    :return dict:
    """
    if len(join_result) == 0:
        return join_result

    # couldn't be the result of a join without two entries on each row
    assert(len(join_result[0]) >= 2)

    right_name = join_result[0][1].__tablename__
    # if there are multiple joins recurse on sub joins
    if len(join_result[0]) > 2:
        right = join_to_nested_dict([res[1:] for res in join_result])
    elif len(join_result[0]) == 2:
        right = [
            json.loads(json.dumps(row[1], cls=JSONEncoder))
            for row in join_result if row[1] is not None
        ]
    right_items = {item['id']: item for item in right}

    items = {}
    for row in join_result:
        # in the case of a right outer join
        if row[0] is None:
            continue
        if row[0].id not in items:
            items[row[0].id] = json.loads(json.dumps(row[0], cls=JSONEncoder))
        # in the case of a left outer join
        if row[1] is None:
            continue
        if right_name not in items[row[0].id]:
            items[row[0].id][right_name] = []
        items[row[0].id][right_name].append(right_items[row[1].id])
    return list(items.values())

And you should be able to just plug the result into this function. However you will need to add the dataclass decorator to your models for this code to work.

statement = ordersTable.join(orderItemsTable, ordersTable.c.orderId==orderItemsTable.c.orderId).select()
result = db.engine.execute(statement)
join_to_nested_dict(result)

Also, if you don't want to use the flask json encoder you can delete the import and cls arguments.

agiera
  • 1