7

Say I have the following models:

class Image(models.Model):
    image   = models.ImageField(max_length=200, upload_to=file_home)
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = generic.GenericForeignKey()

class Article(models.Model):
    text = models.TextField()
    images = generic.GenericRelation(Image)

class BlogPost(models.Model):
    text = models.TextField()
    images = generic.GenericRelation(Image)

What's the most processor- and memory-efficient way to find all Articles that have at least one Image attached to them?

I've done this:

Article.objects.filter(pk__in=Image.objects.filter(content_type=ContentType.objects.get_for_model(Article)).values_list('object_id', flat=True))

Which works, but besides being ugly it takes forever.

I suspect there's a better solution using raw SQL, but that's beyond me. For what it's worth, the SQL generated by the above is as following:

 SELECT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` WHERE `issues_article`.`id` IN (SELECT U0.`object_id` FROM `uploads_image` U0 WHERE U0.`content_type_id` = 26 ) LIMIT 21

EDIT: czarchaic's suggestion has much nicer syntax but even worse (slower) performance. The SQL generated by his query looks like the following:

SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text`, COUNT(`uploads_image`.`id`) AS `num_images` FROM `issues_article` LEFT OUTER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) GROUP BY `issues_article`.`id` HAVING COUNT(`uploads_image`.`id`) > 0  ORDER BY NULL LIMIT 21

EDIT: Hooray for Jarret Hardie! Here's the SQL generated by his should-have-been-obvious solution:

SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` INNER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) WHERE (`uploads_image`.`id` IS NOT NULL AND `uploads_image`.`content_type_id` = 26 ) LIMIT 21
hanksims
  • 1,479
  • 2
  • 12
  • 22
  • Is this your actual model structure, or is there an entire class hierarchy that you aren't representing in your question example for simplicity's sake? I ask because this particular example doesn't require generics at all. – Jarret Hardie Dec 23 '09 at 00:23
  • No, this is a stripped-down and simplified model structure. – hanksims Dec 23 '09 at 00:27
  • 1
    Though your accepted answer works quite well, I'm curious to know what the solution would be if you were requiring something other than 'at least one image'. – czarchaic Dec 23 '09 at 05:32
  • 1
    Well, this was my particular use case, so I haven't really thought about it. Maybe, like, if I wanted all Articles with at least 2 images? Seems like you might want to use the approach you outlined -- but you'd have to cache the living bejeezus out of it, because that is a long, long-running query if you have any significant number of Images or Articles in your system. – hanksims Dec 23 '09 at 08:03

2 Answers2

6

Thanks to generic relations, you should be able to query this structure using traditional query-set semantics for reverse relations:

Article.objects.filter(images__isnull=False)

This will produce duplicates for any Articles that are related to multiple Images, but you can eliminate that with the distinct() QuerySet method:

Article.objects.distinct().filter(images__isnull=False)
Jarret Hardie
  • 95,172
  • 10
  • 132
  • 126
  • I think we have a winner! I'll post the generated SQL in another edit, just for completeness's sake. Ran lighting-quick, though. is_null=False ... sometimes the simplest things are just staring you right in the face. – hanksims Dec 23 '09 at 00:51
  • Thanks hanksims... hope it works out :-) I admit I haven't looked at the SQL, so definitely curious to see that. – Jarret Hardie Dec 23 '09 at 00:55
1

I think your best bet would be to use aggregation

from django.db.models import Count

Article.objects.annotate(num_images=Count('images')).filter(num_images__gt=0)
czarchaic
  • 6,268
  • 29
  • 23