In Django, what is the most efficient way to check for an empty query set?

Question

I've heard suggestions to use the following:

if qs.exists():
    ...

if qs.count():
    ...

try:
    qs[0]
except IndexError:
    ...

Copied from comment below: "I'm looking for a statement like "In MySQL and PostgreSQL count() is faster for short queries, exists() is faster for long queries, and use QuerySet[0] when it's likely that you're going to need the first element and you want to check that it exists. However, when count() is faster it's only marginally faster so it's advisable to always use exists() when choosing between the two."

Chris Sattinger · Answer 1 · 2017-03-02T12:26:00.787

query.exists() is the most efficient way.

Especially on postgres count() can be very expensive, sometimes more expensive then a normal select query.

exists() runs a query with no select_related, field selections or sorting and only fetches a single record. This is much faster then counting the entire query with table joins and sorting.

qs[0] would still includes select_related, field selections and sorting; so it would be more expensive.

The Django source code is here (django/db/models/sql/query.py RawQuery.has_results):

https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499

def has_results(self, using):
    q = self.clone()
    if not q.distinct:
        q.clear_select_clause()
    q.clear_ordering(True)
    q.set_limits(high=1)
    compiler = q.get_compiler(using=using)
    return compiler.has_results()

Another gotcha that got me the other day is invoking a QuerySet in an if statement. That executes and returns the whole query !

If the variable query_set may be None (unset argument to your function) then use:

if query_set is None:
    #

not:

if query_set:
   # you just hit the database

Sam Odio · Accepted Answer · 2020-02-08T22:40:18.607

12

exists() is generally faster than count(), though not always (see test below). count() can be used to check for both existence and length.

Only use qs[0]if you actually need the object. It's significantly slower if you're just testing for existence.

On Amazon SimpleDB, 400,000 rows:

bare qs: 325.00 usec/pass
qs.exists(): 144.46 usec/pass
qs.count() 144.33 usec/pass
qs[0]: 324.98 usec/pass

On MySQL, 57 rows:

bare qs: 1.07 usec/pass
qs.exists(): 1.21 usec/pass
qs.count(): 1.16 usec/pass
qs[0]: 1.27 usec/pass

I used a random query for each pass to reduce the risk of db-level caching. Test code:

import timeit

base = """
import random
from plum.bacon.models import Session
ip_addr = str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))
try:
    session = Session.objects.filter(ip=ip_addr)%s
    if session:
        pass
except:
    pass
"""

query_variatons = [
    base % "",
    base  % ".exists()",
    base  % ".count()",
    base  % "[0]"
    ]

for s in query_variatons:
    t = timeit.Timer(stmt=s)
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100000)

edited Feb 08 '20 at 22:40

answered Aug 03 '11 at 07:40

Sam Odio

2,717
5
22
25

13

In SQL, exists stops scanning the table as soon as a matching record is found whereas a count would be forced to scan the entire table. On that count, I would have thought exists would be quicker. – super9 Nov 03 '11 at 14:45
13

This is not true at all. count() on postgres is especially expensive, sometimes more expensive then a normal select query. exists() runs a query with no select_related or field selections and only gets a single record. it is much faster. source code is here (django/db/models/sql/query.py RawQuery.has_results): https://github.com/django/django/blob/master/django/db/models/sql/query.py#L438 – Chris Sattinger May 10 '14 at 18:50
6

This is definitely not true. [`EXISTS`](http://sqlblog.com/blogs/andrew_kelly/archive/2007/12/15/exists-vs-count-the-battle-never-ends.aspx) is more efficient. – Tom May 22 '15 at 16:50
@Sam Odio I think there's some flaws in your methodology, which I've tried to address in http://stackoverflow.com/a/43388116/698289 – DanH Apr 13 '17 at 09:14
Exists is orders of magnitude faster than count to check for existence, empirically tested, this answer is misleading – Makogan Feb 07 '20 at 19:31

score 11 · Answer 3 · edited Jun 20 '20 at 09:12

11

It depends on use context.

According to documentation:

Use QuerySet.count()

...if you only want the count, rather than doing len(queryset).

Use QuerySet.exists()

...if you only want to find out if at least one result exists, rather than if queryset.

But:

Don't overuse count() and exists()

If you are going to need other data from the QuerySet, just evaluate it.

So, I think that QuerySet.exists() is the most recommended way if you just want to check for an empty QuerySet. On the other hand, if you want to use results later, it's better to evaluate it.

I also think that your third option is the most expensive, because you need to retrieve all records just to check if any exists.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 29 '11 at 07:01

Marcin Świerczyński

2,302
22
32

I'm pretty sure the last option only gets one record, not all records. Also, it's not clear from the documentation that exists is faster than count. If it is, I'd be curious to know the speed delta and whether that varies significantly based on conditions such as the length of the queryset. – Sam Odio Jul 29 '11 at 07:16
You're right - the last option will be translated to something like `SELECT foo FROM bar LIMIT 1`. And here we come to the point - all these methods are ORM API calls and translated to SQL queries. That's why the documentation couldn't say that `exists` is faster than `count`. Or how many times faster it is. It just depends on database implementations of these, its configuration and relevant dataset. I think Clueless is right - you would need to test it on your own to get exact results. – Marcin Świerczyński Jul 29 '11 at 14:37
1

I'd hate to have to test every time I use one of these. It'd be good to just have a general ballpark of what the performance delta is. IE, I'm looking for a statement like "In MySQL and PostgreSQL count() is faster for short queries, exists() is faster for long queries, and use QuerySet[0] when it's likely that you're going to need the first element and you want to check that it exists. However, when count() is faster it's only marginally faster so it's advisable to always use exists() when choosing between the two." – Sam Odio Jul 30 '11 at 03:39

score 6 · Answer 4 · edited May 23 '17 at 12:34

@Sam Odio's solution was a decent starting point but there's a few flaws in the methodology, namely:

The random IP address could end up matching 0 or very few results
An exception would skew the results, so we should aim to avoid handling exceptions

So instead of filtering something that might match, I decided to exclude something that definitely won't match, hopefully still avoiding the DB cache, but also ensuring the same number of rows.

I only tested against a local MySQL database, with the dataset:

>>> Session.objects.all().count()
40219

Timing code:

import timeit
base = """
import random
import string
from django.contrib.sessions.models import Session
never_match = ''.join(random.choice(string.ascii_uppercase) for _ in range(10))
sessions = Session.objects.exclude(session_key=never_match){}
if sessions:
    pass
"""
s = base.format('count')

query_variations = [
    "",
    ".exists()",
    ".count()",
    "[0]",
]

for variation in query_variations:
    t = timeit.Timer(stmt=base.format(variation))
    print "{} => {:02f} usec/pass".format(variation.ljust(10), 1000000 * t.timeit(number=100)/100000)

outputs:

           => 1390.177710 usec/pass
.exists()  => 2.479579 usec/pass
.count()   => 22.426991 usec/pass
[0]        => 2.437079 usec/pass

So you can see that count() is roughly 9 times slower than exists() for this dataset.

[0] is also fast, but it needs exception handling.

score 1 · Answer 5 · answered May 13 '20 at 19:17

I was also in this trouble. Yes exists() is faster for most cases but it depends a lot on the type of queryset you are trying to do. For example for a simple query like: my_objects = MyObject.objets.all() you would use my_objects.exists(). But if you were to do a query like: MyObject.objects.filter(some_attr='anything').exclude(something='what').distinct('key').values() probably you need to test which one fits better (exists(), count(), len(my_objects)). Remember the DB engine is the one who will perform the query, and to get a good result in performance, it depends a lot on the data structure and how the query is formed. One thing you can do is, audit the queries and test them on your own against the DB engine and compare your results you will be surprised by how naive sometimes django is, try QueryCountMiddleware to see all the queries executed, and you will see what i am talking about.

score 1 · Answer 6 · answered Jul 29 '11 at 05:30

I would imagine that the first method is the most efficient way (you could easily implement it in terms of the second method, so perhaps they are almost identical). The last one requires actually getting a whole object from the database, so it is almost certainly the most expensive.

But, like all of these questions, the only way to know for your particular database, schema and dataset is to test it yourself.

In Django, what is the most efficient way to check for an empty query set?

6 Answers6

Use QuerySet.count()

Use QuerySet.exists()

Don't overuse count() and exists()

Linked