1

Consider this django model setup:

from django.db import models

class Foo(models.Model):
    field_foo = models.CharField(max_length=20,
                                 null=False,
                                 blank=False,
                                 unique=True,)

class Bar(models.Model):
    field_bar = models.CharField(max_length=20,
                                 null=False,
                                 blank=False,
                                 unique=True,)

class Foobar(models.Model):
    field_foo = models.ForeignKey(foo,on_delete=models.CASCADE)
    field_bar = models.ForeignKey(bar,on_delete=models.CASCADE)

I want to look for two rows that have the same field_foo and field_bar values. I can do this manually, but I want to know if there is a feature of django that takes care of that. The way I do this now is:

for f in Foo.objects.all():
    for b in Bar.objects.all():
        fb = Foobar.objects.filter(foo=f, bar=b)
        if len(fb)>2:
            something='do'
scūriolus
  • 657
  • 2
  • 5
  • 15
Yotam
  • 10,295
  • 30
  • 88
  • 128
  • I havent tested (setting up those models can be time consuming),but this pure orm query should work, `Foobar.objects.filter(field_foo__field_foo=F('field_bar__field_bar')).values("field_foo__field_foo").annotate(freq=Count("field_foo__field_foo")).filter(freq__gte=2)` – Julkar9 Apr 11 '23 at 19:19

3 Answers3

2

You can use the annotate() and filter() methods together to filter the FooBar instances with the same Foo and Bar instances as relations.

Give this a try

duplicate_foobars = Foobar.objects.values('id', 'field_foo', 'field_bar').annotate(
    count=Count('id')
).filter(
    count__gt=1
).values_list('id', 'field_foo', 'field_bar')

for id, field_foo, field_bar in duplicate_foobars:
    print(f"Duplicate Foobar: #{id} field_foo={field_foo}, field_bar={field_bar}")

Output

Duplicate Foobar: #1, field_foo=1, field_bar=1

Tested models

class Foo(models.Model):
    field_foo = models.CharField(
        max_length=20,
        null=False,
        blank=False,
        unique=True,
    )


class Bar(models.Model):
    field_bar = models.CharField(
        max_length=20,
        null=False,
        blank=False,
        unique=True,
    )


class Foobar(models.Model):
    field_foo = models.ForeignKey(Foo, on_delete=models.CASCADE)
    field_bar = models.ForeignKey(Bar, on_delete=models.CASCADE)
Sumithran
  • 6,217
  • 4
  • 40
  • 54
  • Thanks. This is more or less what I need, but, is there a way for me to get the id of the duplicated ``Foobars``? I can do a query on the values, but is there a way to avoid that? – Yotam Apr 11 '23 at 20:16
  • please try the updated anser – Sumithran Apr 12 '23 at 14:46
1

Your manual way is also inefficient. You can do this in O(n) by using:

foo_bar_count = {}
for fb in Foobar.objects.all():
    foo_bar_pair = (fb.field_foo_id, fb.field_bar_id)
    
    if foo_bar_pair not in foo_bar_count:
        foo_bar_count[foo_bar_pair] = 0
        continue

    foo_bar_count[foo_bar_pair] += 1

    if foo_bar_count[foo_bar_pair] > 2:
        # do something

You can also try doing this using some fancy query. Check: Django select only rows with duplicate field values

Bartosz Stasiak
  • 1,415
  • 1
  • 4
  • 9
  • Thanks, my concern was efficiency indeed. Just a question, doesn't storing things in a dictionary makes the search as inefficient as my approach. I mean, I now add storing and getting info from a dictionary. – Yotam Apr 11 '23 at 20:11
  • 1
    In your approach you are doing 2 queries to database at first and then O(n*m) queries . So for 1000 (n) Foo and 1000 (m) bar you will make 1 000 002 queries to the database which will be extremely slow. In my case there is always only one query and operations on dict is very fast. The downside is memory usage. – Bartosz Stasiak Apr 12 '23 at 08:01
  • How costly are they in memory terms (assuming the same numbers). – Yotam Apr 12 '23 at 10:42
  • I ran some benchmark and it was saying that, in the worst case scenario of 1 000 000 FooBar objects with unique pairs it will take 100 MB of memory – Bartosz Stasiak Apr 12 '23 at 16:33
1

Assuming there's a typo in your question (there is a f instead of a b, what you want is an alternative to fb = Foobar.objects.filter(foo=f, bar=b) for any pair f,b), here's a hint:

queryset = (
    Foobar.objects
    .values("field_foo", "field_bar")
    .annotate(nb=models.Count("id"))
    .filter(nb__gt=1)
)

Actually, the most optimal approach will depend on what you want to achieve then.

(1) If you want to know how many Foobar instances exist for each couple of values f,b:

foobars = queryset.values("field_foo__field_foo", "field_bar__field_bar", "nb")

(2) Or if you want to retrieve all corresponding instances, you could do:

from django.db.models import Q

any_of = (
    Q(field_foo__field_foo=x, field_bar__field_bar=y)
    for x, y in queryset.values_list("field_foo__field_foo", "field_bar__field_bar")
)
criteria = Q()
for q_obj in any_of:
    criteria |= q_obj

Foobar.objects.filter(criteria)

There might be a way to do the same thing using a subquery too, have a look at the doc.

scūriolus
  • 657
  • 2
  • 5
  • 15