85

I'm using SQL Alchemy's ORM and I find when I return a single column I get the results like so:

[(result,), (result_2,)] # etc...

With a set like this I find that I have to do this often:

results = [r[0] for r in results] # So that I just have a list of result values

This isn't that "bad" because my result sets are usually small, but if they weren't this could add significant overhead. The biggest thing is I feel it clutters the source, and missing this step is a pretty common error I run into.

Is there any way to avoid this extra step?

A related aside: This behaviour of the orm seems inconvenient in this case, but another case where my result set was, [(id, value)] it ends up like this:

[(result_1_id, result_1_val), (result_2_id, result_2_val)]

I then can just do:

results = dict(results) # so I have a map of id to value

This one has the advantage of making sense as a useful step after returning the results.

Is this really a problem or am I just being a nitpick and the post processing after getting the result set makes sense for both cases? I'm sure we can think of some other common post processing operations to make the result set more usable in the application code. Is there high performance and convenient solutions across the board or is post processing unavoidable, and merely required for varying application usages?

When my application can actually take advantage of the objects that are returned by SQL Alchemy's ORM it seems extremely helpful, but in cases where I can't or don't, not so much. Is this just a common problem of ORMs in general? Am I better off not using the ORM layer in cases like this?

I suppose I should show an example of the actual orm queries I'm talking about:

session.query(OrmObj.column_name).all()

or

session.query(OrmObj.id_column_name, OrmObj.value_column_name).all()

Of course, in a real query there'd normally be some filters, etc.

Derek Litz
  • 10,529
  • 7
  • 43
  • 53

7 Answers7

27

One way to decrease the clutter in the source is to iterate like this:

results = [r for (r, ) in results]

Although this solution is one character longer than using the [] operator, I think it's easier on the eyes.

For even less clutter, remove the parenthesis. This makes it harder when reading the code, to notice that you're actually handling tuples, though:

results = [r for r, in results]
Dag Høidahl
  • 7,873
  • 8
  • 53
  • 66
23

Python's zip combined with the * inline expansion operator is a pretty handy solution to this:

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> zip(*results)
[('result', 'result_2', 'result_3')]

Then you only have to [0] index in once. For such a short list your comprehension is faster:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.010490894317626953
>>> timeit('result = [ result[0] for result in [("result",), ("result_2",), ("result_3",)] ]', number=10000)
0.0028390884399414062

However for longer lists zip should be faster:

>>> timeit('result = zip(*[(1,)]*100)', number=10000)
0.049577951431274414
>>> timeit('result = [ result[0] for result in [(1,)]*100 ]', number=10000)
0.11178708076477051

So it's up to you to determine which is better for your situation.

Beright
  • 395
  • 3
  • 5
18

I struggled with this too until I realized it's just like any other query:

for result in results:
     print result.column_name
Pakman
  • 2,170
  • 3
  • 23
  • 41
  • Yeah, I agree, the NamedTuples are not immediately obvious. My most common post processing is still having to create a dictionary of some sorts. Most of my post processing has been eliminated by better database architecture and SQLAlchemy usage so I have the values I need attached to ORM objects, or available through methods. – Derek Litz Apr 25 '13 at 21:45
13

Starting from version 1.4 SQLAlchemy provides a method to retrieve results for a single column as a list of values:

# ORM
>>> session.scalars(select(User.name)).all()
['ed', 'wendy', 'mary', 'fred']
# or
>>> query = session.query(User.name)
>>> session.scalars(query).all()
['ed', 'wendy', 'mary', 'fred']

# Core
>>> with engine.connect() as connection:
...     result = connection.execute(text("select name from users"))
...     result.scalars().all()
... 
['ed', 'wendy', 'mary', 'fred']

See the SQLAlchemy documentation.

b3000
  • 1,547
  • 1
  • 15
  • 27
  • 1
    This method is not present in a query object, only after running a literal SQL sentence. – josemfc May 13 '22 at 12:08
  • 5
    @josemfc you can do `session.scalars(session.query(MyModel.id))` or `session.scalars(select(MyModel.id))` if you are using the ORM. – snakecharmerb Sep 04 '22 at 09:00
  • @b3000 it would be great if you could update your answer to cover ORM models and Core tables. – snakecharmerb Sep 04 '22 at 09:03
  • 1
    @snakecharmerb thanks for your comments! I updated the answer accordingly. – b3000 Sep 07 '22 at 21:48
  • Note that the snippet submitted by @snakecharmerb, raises a depracation warning : `SADeprecationWarning: Object should not be used directly in a SQL statement context, such as passing to methods such as session.execute(). This usage will be disallowed in a future release...` – edg Oct 06 '22 at 09:08
  • Good answer. Just a site note : If you select multiple columns `.scalars().all()` returns the first column only. So `select(User.name, User.id)` has the same ScalarResult as `select(User.name)`. However, to select all columns using `select(User)`, the ScalarResult is helpful again, as it returns a flattened list of `User`-objects instead of the Row-tuples. – Kim Nov 15 '22 at 22:09
2

I found the following more readable, also includes the answer for the dict (in Python 2.7):

d = {id_: name for id_, name in session.query(Customer.id, Customer.name).all()}
l = [r.id for r in session.query(Customer).all()]

For the single value, borrowing from another answer:

l = [name for (name, ) in session.query(Customer.name).all()]

Compare with the built-in zip solution, adapted to the list:

l = list(zip(*session.query(Customer.id).all())[0])

which in my timeits provides only about 4% speed improvements.

Roman Susi
  • 4,135
  • 2
  • 32
  • 47
0

My solution looks like this ;)

def column(self):
    for column, *_ in Model.query.with_entities(Model.column).all():
        yield column

NOTE: py3 only.

brunsgaard
  • 5,066
  • 2
  • 16
  • 15
-3

Wow, guys, why strain? There are method steeper way, faster and more elegant)

>>> results = [('result',), ('result_2',), ('result_3',)]
>>> sum(results, tuple())
('result', 'result_2', 'result_3')

Speed:

>>> timeit('result = zip(*[("result",), ("result_2",), ("result_3",)])', number=10000)
0.004222994000883773
>>> timeit('result = sum([("result",), ("result_2",), ("result_3",)], ())', number=10000)
0.0038205889868550003

But if more elements in list - use only zip. Zip more speed.

neonua
  • 152
  • 2
  • 4
  • 2
    This takes quadratic time, as you create a new tuple for every sum step. For 1000 elements, you create an empty tuple, then a tuple with 1 element, then a tuple with 2 elements, etc. Each new tuple created has to copy all references from the previous tuple first, so in the end you created N * N! copies of the references. **Do not use `sum()` to concatenate sequences**. – Martijn Pieters Aug 16 '18 at 15:12
  • 1
    That the 3-element sum is faster is lucky, you should really time this with much, much larger inputs. – Martijn Pieters Aug 16 '18 at 15:13