0

How can I perform the following correlated EXISTS and NOT EXISTS in Django?

I do not want to use IN and NOT IN. These are not the same as EXISTS and NOT EXISTS.

Correlated EXISTS:

SELECT *
FROM foo
WHERE EXISTS (
    SELECT 1
    FROM bar
    WHERE foo.baz = bar.baz

)

Correlated NOT EXISTS:

SELECT *
FROM foo
WHERE NOT EXISTS (
    SELECT 1
    FROM bar
    WHERE foo.baz = bar.baz

)
Matthew Moisen
  • 16,701
  • 27
  • 128
  • 231
  • https://docs.djangoproject.com/en/4.0/topics/db/queries/#complex-lookups-with-q I don't know exactly what you mean by "correlated", but you can chain together any amount of complex sql conditions with compounding filters. – noone392 Feb 13 '22 at 18:24
  • In my example, see how the `foo` table is being linked to the `bar` table via the exists? Specifically this condition: `WHERE foo.baz = bar.baz`. That's what I mean by correlated, I'm not sure if that is the correct word though. – Matthew Moisen Feb 13 '22 at 18:29
  • hmm I am still a tad confused. in the first one you want to return all foo obects when there are any instances where the baz column in foo has an equivalent value in the baz column in bar? and in the second example get all foo objects where there are NOT matches between the 2 tables? your code does not represent WHERE foo.baz = bar.baz – noone392 Feb 13 '22 at 18:38
  • How are the two models related? Is `foo.baz` a ForeignKey to `bar`? – Iain Shelvington Feb 13 '22 at 18:39
  • @noone392 Yes that is correct. It is "equivalent" to `select * from foo where baz in (select baz from bar)` or `select * from foo where baz not in (select baz from bar)`. However the database uses a different algorithm when you do the IN vs EXIST and NOT IN vs NOT EXISTS style, and there are some edge cases with null columns and indexes which would make you prefer one over the other. – Matthew Moisen Feb 13 '22 at 18:45
  • @IainShelvington It could be a foreign key, but it does not have to. – Matthew Moisen Feb 13 '22 at 18:46
  • ok so a few things. In django the query sets don't actually run until you need the data. There is a separate caching system that goes on underneath for optimization. It looks like there are some answers. But let me strongly recommend you add a foreign relationship or a many to many between the tables. These intermediate caching tables is what allows the engine to optimize to traverse seamlessly. So if you specified a many to many and called the caching table baz, you can do things like "foo.bar" and "bar.foo.count()" extremely fast. – noone392 Feb 13 '22 at 19:02
  • I want to expand on this. if baz is a text field, and both bar and foo have 10,000 entries that is 100,000,000 string comparisons each time you run this if you don't match on a static value of baz. Not even related to django but this architecture will crash in production with any framework because the matching case is exponential. any acceptable architecture design would have a foreign key or a caching table. If nothing else everytime you make an entry on foo or bar check the opposing table for a duplicate value and record that in a text file lol. – noone392 Feb 13 '22 at 19:24
  • @noone392 This is just a contrived example. In reality you would have an additional condition limiting the number of rows from `foo` returned and an index on that column, and you would have a b-tree index on bar.baz. It is extremely fast and very common to execute this kind of query. – Matthew Moisen Feb 13 '22 at 19:30
  • Dude there is no one that wouldn't see that your database isn't normalized and that your matching case does not grow linearly. If your data allows for some B-Tree optimization then this question is irrelevant in the first place. I would recommend using that to see how many they have in common. Otherwise just create a many-to-many in django sense that is exactly what it designed for. – noone392 Feb 13 '22 at 19:44
  • @noone392 You seem to have some misunderstanding. – Matthew Moisen Feb 13 '22 at 20:41

2 Answers2

0

You can use method of QuerySet - exists(). Assuming that:

class Foo(models.Model):
    bar = models.ForeignKey(Bar, ..., related_name='foos')

You can check if there is relation:

foo = Foo.objects.get(id=1)
bar = Bar.objects.get(id=1)

bar.foos.exists()   # returns True if not empty else False
foo.bar             # returns related object or None
NixonSparrow
  • 6,130
  • 1
  • 6
  • 18
  • 1
    and to add to this you can compound .filter as many times as you want and then within a filter add complex queries with "Q" objects – noone392 Feb 13 '22 at 18:26
  • This is just `select 1 from bar where foo = 'foo' limit 1`. – Matthew Moisen Feb 13 '22 at 18:28
  • Just use any filter in it. `exists` will return either `True` or `False` depending on if there is at least one object. You can filter it with `ForeignKey` relation and reverse-relation. Edited answer for more info. – NixonSparrow Feb 13 '22 at 18:56
0

You can use an Exists() subquery to generate a query like you desire. Not entirely sure how your model looks but the following may look similar to what you want

from django.db.models import Exists, OuterRef

matching_bars = Bar.objects.filter(baz=OuterRef('baz'))
Foo.objects.filter(Exists(matching_bars))

To use NOT EXISTS just prefix the Exists subquery with ~

Foo.objects.filter(~Exists(matching_bars))
Iain Shelvington
  • 31,030
  • 3
  • 31
  • 50
  • Thanks, this is correct. May I ask where you learned this? Did you just read the docs in the entirety or did you pick this up in any book? – Matthew Moisen Feb 13 '22 at 19:36
  • @MatthewMoisen I have just read the docs in their entirety, the Django docs are some of the best I have ever seen and include pretty useful examples – Iain Shelvington Feb 13 '22 at 19:41