SQL Alchemy ORM returning a single column, how to avoid common post processing

Question

I'm using SQL Alchemy's ORM and I find when I return a single column I get the results like so:

[(result,), (result_2,)] # etc...

With a set like this I find that I have to do this often:

results = [r[0] for r in results] # So that I just have a list of result values

This isn't that "bad" because my result sets are usually small, but if they weren't this could add significant overhead. The biggest thing is I feel it clutters the source, and missing this step is a pretty common error I run into.

Is there any way to avoid this extra step?

A related aside: This behaviour of the orm seems inconvenient in this case, but another case where my result set was, [(id, value)] it ends up like this:

[(result_1_id, result_1_val), (result_2_id, result_2_val)]

I then can just do:

results = dict(results) # so I have a map of id to value

This one has the advantage of making sense as a useful step after returning the results.

Is this really a problem or am I just being a nitpick and the post processing after getting the result set makes sense for both cases? I'm sure we can think of some other common post processing operations to make the result set more usable in the application code. Is there high performance and convenient solutions across the board or is post processing unavoidable, and merely required for varying application usages?

When my application can actually take advantage of the objects that are returned by SQL Alchemy's ORM it seems extremely helpful, but in cases where I can't or don't, not so much. Is this just a common problem of ORMs in general? Am I better off not using the ORM layer in cases like this?

I suppose I should show an example of the actual orm queries I'm talking about:

session.query(OrmObj.column_name).all()

or

session.query(OrmObj.id_column_name, OrmObj.value_column_name).all()

Of course, in a real query there'd normally be some filters, etc.

score 27 · Answer 1 · answered Oct 29 '12 at 08:51

27

One way to decrease the clutter in the source is to iterate like this:

results = [r for (r, ) in results]

Although this solution is one character longer than using the [] operator, I think it's easier on the eyes.

For even less clutter, remove the parenthesis. This makes it harder when reading the code, to notice that you're actually handling tuples, though:

results = [r for r, in results]

answered Oct 29 '12 at 08:51

Dag Høidahl

7,873
8
53
66

To do this internal to sqlalchemy in 1.4+, check out @b3000's answer lower down. – Matt Sanders May 03 '23 at 04:36

score 23 · Answer 2 · answered Oct 04 '12 at 07:01

Python's zip combined with the * inline expansion operator is a pretty handy solution to this:

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> zip(*results)
[('result', 'result_2', 'result_3')]

Then you only have to [0] index in once. For such a short list your comprehension is faster:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.010490894317626953
>>> timeit('result = [ result[0] for result in [("result",), ("result_2",), ("result_3",)] ]', number=10000)
0.0028390884399414062

However for longer lists zip should be faster:

>>> timeit('result = zip(*[(1,)]*100)', number=10000)
0.049577951431274414
>>> timeit('result = [ result[0] for result in [(1,)]*100 ]', number=10000)
0.11178708076477051

So it's up to you to determine which is better for your situation.

The assumption here is that there might be a *million* elements in the list and suddenly this doesn't look very clever. — Antti Haapala -- Слава Україні, May 10 '20 at 12:23

score 18 · Answer 3 · answered Apr 25 '13 at 21:01

18

I struggled with this too until I realized it's just like any other query:

for result in results:
     print result.column_name

answered Apr 25 '13 at 21:01

Pakman

2,170
3
23
41

Yeah, I agree, the NamedTuples are not immediately obvious. My most common post processing is still having to create a dictionary of some sorts. Most of my post processing has been eliminated by better database architecture and SQLAlchemy usage so I have the values I need attached to ORM objects, or available through methods. – Derek Litz Apr 25 '13 at 21:45

b3000 · Answer 4 · 2022-09-07T21:47:01.117

13

Starting from version 1.4 SQLAlchemy provides a method to retrieve results for a single column as a list of values:

# ORM
>>> session.scalars(select(User.name)).all()
['ed', 'wendy', 'mary', 'fred']
# or
>>> query = session.query(User.name)
>>> session.scalars(query).all()
['ed', 'wendy', 'mary', 'fred']

# Core
>>> with engine.connect() as connection:
...     result = connection.execute(text("select name from users"))
...     result.scalars().all()
... 
['ed', 'wendy', 'mary', 'fred']

See the SQLAlchemy documentation.

edited Sep 07 '22 at 21:47

answered Mar 25 '22 at 16:06

b3000

1,547
1
15
27

1

This method is not present in a query object, only after running a literal SQL sentence. – josemfc May 13 '22 at 12:08
5

@josemfc you can do `session.scalars(session.query(MyModel.id))` or `session.scalars(select(MyModel.id))` if you are using the ORM. – snakecharmerb Sep 04 '22 at 09:00
@b3000 it would be great if you could update your answer to cover ORM models and Core tables. – snakecharmerb Sep 04 '22 at 09:03
1

@snakecharmerb thanks for your comments! I updated the answer accordingly. – b3000 Sep 07 '22 at 21:48
Note that the snippet submitted by @snakecharmerb, raises a depracation warning : `SADeprecationWarning: Object should not be used directly in a SQL statement context, such as passing to methods such as session.execute(). This usage will be disallowed in a future release...` – edg Oct 06 '22 at 09:08
Good answer. Just a site note : If you select multiple columns `.scalars().all()` returns the first column only. So `select(User.name, User.id)` has the same ScalarResult as `select(User.name)`. However, to select all columns using `select(User)`, the ScalarResult is helpful again, as it returns a flattened list of `User`-objects instead of the Row-tuples. – Kim Nov 15 '22 at 22:09

score 2 · Answer 5 · answered Jan 01 '17 at 13:18

I found the following more readable, also includes the answer for the dict (in Python 2.7):

d = {id_: name for id_, name in session.query(Customer.id, Customer.name).all()}
l = [r.id for r in session.query(Customer).all()]

For the single value, borrowing from another answer:

l = [name for (name, ) in session.query(Customer.name).all()]

Compare with the built-in zip solution, adapted to the list:

l = list(zip(*session.query(Customer.id).all())[0])

which in my timeits provides only about 4% speed improvements.

score 0 · Answer 6 · answered Nov 17 '15 at 11:40

0

My solution looks like this ;)

def column(self):
    for column, *_ in Model.query.with_entities(Model.column).all():
        yield column

NOTE: py3 only.

answered Nov 17 '15 at 11:40

brunsgaard

5,066
2
16
15

neonua · Answer 7 · 2018-02-26T13:34:41.960

-3

Wow, guys, why strain? There are method steeper way, faster and more elegant)

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> sum(results, tuple())
('result', 'result_2', 'result_3')

Speed:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.004222994000883773
>>> timeit('result = sum([("result",), ("result_2",), ("result_3",)], ())', number=10000)
0.0038205889868550003

But if more elements in list - use only zip. Zip more speed.

edited Feb 26 '18 at 13:34

answered Feb 26 '18 at 13:09

neonua

152
2
4

2

This takes quadratic time, as you create a new tuple for every sum step. For 1000 elements, you create an empty tuple, then a tuple with 1 element, then a tuple with 2 elements, etc. Each new tuple created has to copy all references from the previous tuple first, so in the end you created N * N! copies of the references. **Do not use `sum()` to concatenate sequences**. – Martijn Pieters Aug 16 '18 at 15:12
1

That the 3-element sum is faster is lucky, you should really time this with much, much larger inputs. – Martijn Pieters Aug 16 '18 at 15:13

SQL Alchemy ORM returning a single column, how to avoid common post processing

7 Answers7

Linked

Related