2

I have a specify use case but my question pertains to the best way of doing this in general.

I have three tables

Order - primary key order_id

OrderLine - Linking table with order_id, product_id and quantity. An order has 1 or more order lines

Product - primary key product_id, each order line has one product

In sqlachemy / python how do I generate nested JSON along the lines of:

{
    "orders": [
        {
            "order_id": 1
            "some_order_level_detail": "Kansas"
            "order_lines": [
                {
                    "product_id": 1,
                    "product_name": "Clawhammer",
                    "quantity": 5
                },
                ...
            ]
        },
        ...
    ]
}

Potential Ideas

Hack away doing successive queries

First idea which I want to get away from if possible is using list comprehesion and a brute force approach.

def get_json():
    answer = {
        "orders": [
            {
                "order_id": o.order_id,
                "some_order_level_detail": o.some_order_level_detail,
                "order_lines": [
                    {
                        "product_id": 1,
                        "product_name": Product.query.get(o_line.product_id).product_name,
                        "quantity": 5
                    }
                    for o_line in OrderLine.query.filter(order_id=o.order_id).all()
                ]
            }
            for o in Order.query.all()
        ]
    }

This gets hard to maintain mixing the queries with json. Ideally I'd like to do a query first...

Get joined results first, somehow manipulate later

The second idea is to do a join query to join the three tables showing per row in OrderLine the order and product details.

My question to pythonista out there is is there a nice way to convert this to nested json.

Another way?

This really seems like such a common requirement I'm really wondering whether there is a book method for this sort of thing? Is there an SQLAchemy version of this

Andrew Allen
  • 6,512
  • 5
  • 30
  • 73

3 Answers3

3

Look into marshmallow-sqlalchemy, as it does exactly what you're looking for.

I strongly advise against baking your serialization directly into your model, as you will eventually have two services requesting the same data, but serialized in a different way (including fewer or more nested relationships for performance, for instance), and you will either end up with either (1) a lot of bugs that your test suite will miss unless you're checking for literally every field or (2) more data serialized than you need and you'll run into performance issues as the complexity of your application scales.

With marshmallow-sqlalchemy, you'll need to define a schema for each model you'd like to serialize. Yes, it's a bit of extra boilerplate, but believe me - you will be much happier in the end.

We build applications using flask-sqlalchemy and marshmallow-sqlalchemy like this (also highly recommend factory_boy so that you can mock your service and write unit tests in place of of integration tests that need to touch the database):

# models

class Parent(Base):
    __tablename__ = 'parent'
    id = Column(Integer, primary_key=True)
    children = relationship("Child", back_populates="parent")

class Child(Base):
    __tablename__ = 'child'
    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parent.id'))
    parent = relationship('Parent', back_populates='children',
                          foreign_keys=[parent_id])

# schemas. Don't put these in your models. Avoid tight coupling here

from marshmallow_sqlalchemy import ModelSchema
import marshmallow as ma


class ParentSchema(ModelSchema):
    children = ma.fields.Nested(
        'myapp.schemas.child.Child', exclude=('parent',), many=True)
    class Meta(ModelSchema.Meta):
        model = Parent
        strict = True
        dump_only = ('id',)


class ChildSchema(ModelSchema):
    parent = ma.fields.Nested(
        'myapp.schemas.parent.Parent', exclude=('children',))
    class Meta(ModelSchema.Meta):
        model = Child
        strict = True
        dump_only = ('id',)

# services

class ParentService:
    '''
    This service intended for use exclusively by /api/parent
    '''
    def __init__(self, params, _session=None):
        # your unit tests can pass in _session=MagicMock()
        self.session = _session or db.session
        self.params = params

    def _parents(self) -> typing.List[Parent]:
        return self.session.query(Parent).options(
            joinedload(Parent.children)
        ).all()

    def get(self):
        schema = ParentSchema(only=(
            # highly recommend specifying every field explicitly
            # rather than implicit
            'id',
            'children.id',
        ))
        return schema.dump(self._parents()).data

# views

@app.route('/api/parent')
def get_parents():
    service = ParentService(params=request.get_json())
    return jsonify(data=service.get())


# test factories
class ModelFactory(SQLAlchemyModelFactory):
    class Meta:
        abstract = True
        sqlalchemy_session = db.session

class ParentFactory(ModelFactory):
    id = factory.Sequence(lambda n: n + 1)
    children = factory.SubFactory('tests.factory.children.ChildFactory')

class ChildFactory(ModelFactory):
    id = factory.Sequence(lambda n: n + 1)
    parent = factory.SubFactory('tests.factory.parent.ParentFactory')

# tests
from unittest.mock import MagicMock, patch

def test_can_serialize_parents():
    parents = ParentFactory.build_batch(4)
    session = MagicMock()
    service = ParentService(params={}, _session=session)
    assert service.session is session
    with patch.object(service, '_parents') as _parents:
        _parents.return_value = parents
        assert service.get()[0]['id'] == parents[0].id
        assert service.get()[1]['id'] == parents[1].id
        assert service.get()[2]['id'] == parents[2].id
        assert service.get()[3]['id'] == parents[3].id
Stephen Fuhry
  • 12,624
  • 6
  • 56
  • 55
  • Thank you for pointing me in a direction. I'm struggling with the related docs and your answer (esp. 2nd paragraph and the code itself). Where is `ParentSchema` coming from in the get method? – Andrew Allen Apr 06 '19 at 16:39
  • Updated that example - I had it inadvertently called it simply `Parent` – Stephen Fuhry Apr 06 '19 at 16:52
  • Regarding my 2nd paragraph recommending against baking serialization directly in the model, I mean I recommend against the pattern where you have your model "serialize itself" by doing `parent.json()` as others have suggested. I've been down that road and it gets nasty – Stephen Fuhry Apr 06 '19 at 17:22
  • 1
    definitely looks like this is the way to go, it has a painful learning curve similar to sqlalchemy itself but you can't deny the results, once again thanks for the pointer – Andrew Allen Apr 06 '19 at 22:12
  • 1
    Definitely a learning curve, but I promise you'll thank me in 2yrs if you're still working on the same project! – Stephen Fuhry Apr 07 '19 at 00:56
1

I would add a .json() method to each model, so that they call each other. It's essentially your "hacked" solution but a bit more readable/maintainable. Your Order model could have:

def json(self):
    return {
        "id": self.id,
        "order_lines": [line.json() for line in self.order_lines]
    }

Your OrderLine model could have:

def json(self):
    return {
        "product_id": self.product_id,
        "product_name": self.product.name,
        "quantity": self.quantity
    }

Your resource at the top level (where you're making the request for orders) could then do:

...
orders = Order.query.all()
return {"orders": [order.json() for order in orders]}
...

This is how I normally structure this JSON requirement.

Jose Salvatierra
  • 2,407
  • 6
  • 21
  • 41
  • Any issues to be aware of as this gets more complex such as many-to-many? I'm likely to make this accepted answer, but want to explore [this](https://trvrm.github.io/using-sqlalchemy-and-postgres-functions-to-produce-json-tree-structures-from-sql-joins.html) first...seems to have advantage of query-first-one-query approach at expense of specifying database as Postgres. – Andrew Allen Apr 05 '19 at 13:43
  • The linked guide seems nice, but less clear and less explicit to me. Also, like you say it limits you to PostgreSQL only. It may be faster since it uses json_agg from SQL instead of doing the JSON conversion yourself. – Jose Salvatierra Apr 05 '19 at 15:22
  • This gets a +1 for use of explicit simple functions over incorporating external frameworks/libraries. It will surely stand up to the test of time better. – nachonachoman Jan 19 '21 at 17:07
0

Check my answer in this thread Flask Sqlalchmey - Marshmallow Nested Schema fails for joins with filter ( where ) conditions and using the Marshmallow package you include in your schema something like this:

name = fields.Nested(Schema, many=True)

Greg Holst
  • 874
  • 10
  • 23