I want to return a large dataset using FastAPI StreamingResponse
and in the repository/logic layer, after doing my query stuff, I'm returning the data in this way:
for record in query.yield_per(DATA_RECORDS_LIMIT):
yield record.to_entity()
I initially had some encoding issues (encoder method was missing, couldn't serialise a datetime etc...) and thanks to this https://github.com/encode/starlette/issues/419#issuecomment-470077657 and this https://fastapi.tiangolo.com/tutorial/encoder/#using-the-jsonable_encoder I ended up with this final code in the web handler:
...
results = get_data()
def _encoded_results():
yield "["
for idx, item in enumerate(results):
if idx > 0:
yield ","
yield json.dumps(jsonable_encoder(item.dict()))
yield "]"
return StreamingResponse(_encoded_results())
Now... before you ask: yes, it works, but I was wondering if all this is necessary or if there is a better way of doing that. To add more context the record in the first snippet is a SQLAlchemy model instance and the .to_entity()
transform it to a Pydantic data instance. In the second snippet I call .dict()
on the pydantic class so I get a Python dict
which can be passed through jsonable_encoder
before being .json.dumps(...)
.
I'm pretty sure I'm not the only one trying to use FastAPI to return/stream a very large dataset, so I'm wondering if there is anything builtin or a better way to do this. Thanks
Note: my main concern (in case I wasn't clear) it's about the fact that given a Pydantic entity I need to first call the .dict()
method, then it needs to through jsonable_encoder
and finally through json.dumps
. I wish this transformation was implemented somewhere inside FastAPI and hidden from the web handler. TL/DR _encoded_results
shouldn't be needed.