Is there a better way to serve and encode a large SQLAlchemy dataset from FastAPI?

Question

I want to return a large dataset using FastAPI StreamingResponse and in the repository/logic layer, after doing my query stuff, I'm returning the data in this way:

for record in query.yield_per(DATA_RECORDS_LIMIT):
            yield record.to_entity()

I initially had some encoding issues (encoder method was missing, couldn't serialise a datetime etc...) and thanks to this https://github.com/encode/starlette/issues/419#issuecomment-470077657 and this https://fastapi.tiangolo.com/tutorial/encoder/#using-the-jsonable_encoder I ended up with this final code in the web handler:

...
results = get_data()

def _encoded_results():
        yield "["
        for idx, item in enumerate(results):
            if idx > 0:
                yield ","
            yield json.dumps(jsonable_encoder(item.dict()))
        yield "]"

return StreamingResponse(_encoded_results())

Now... before you ask: yes, it works, but I was wondering if all this is necessary or if there is a better way of doing that. To add more context the record in the first snippet is a SQLAlchemy model instance and the .to_entity() transform it to a Pydantic data instance. In the second snippet I call .dict() on the pydantic class so I get a Python dict which can be passed through jsonable_encoder before being .json.dumps(...).

I'm pretty sure I'm not the only one trying to use FastAPI to return/stream a very large dataset, so I'm wondering if there is anything builtin or a better way to do this. Thanks

Note: my main concern (in case I wasn't clear) it's about the fact that given a Pydantic entity I need to first call the .dict() method, then it needs to through jsonable_encoder and finally through json.dumps. I wish this transformation was implemented somewhere inside FastAPI and hidden from the web handler. TL/DR _encoded_results shouldn't be needed.

I guess the problem comes from the fact that you’re trying to stream JSON so you can’t benefit from all the goodies FastAPI has to offer. You don’t need to call `.dict() `, `jsonable_encoder` is able to parse Pydantic models, assuming individual items are models. The usual workflow is to declare the response model with the `response_model` parameter and FastAPI will do the parsing, validation etc. You could perhaps stream bytes or text and then massage the data on the client side but not sure if this viable option in your case. — HTF, Aug 27 '20 at 07:25

Is there a better way to serve and encode a large SQLAlchemy dataset from FastAPI?

0 Answers0