0

I have the following models in Django (simplified for brevity):

class DistinctWord(models.Model):
    ...

class Word(models.Model):
    distinct_word = models.ForeignKey('DistinctWord', related_name='words')
    ...

class UserWord(models.Model):
    distinct_word = models.ForeignKey(DistinctWord, related_name='user_words')
    ...

In words: DistinctWord is the root of words derived from each other (e.g., silly, sillier, silliest), and UserWord is the user's dictionary. So, when a user adds a Word to the his dictionary he actually adds the root word (and thus all related words). So, I must bring an actual Word (e.g., the first one) when he requests to see/study the words in his dictionary.

That is, for a given queryset of UserWords (say uw), I would like to retrieve the first Word related for each row in that queryset (preferably in one or a few trips to the database, not one for each row). This would be a simple join, group by and limit 1 in raw sql, but I could not wrap my head around it in Django.

mehmet
  • 7,720
  • 5
  • 42
  • 48
  • This looks like a duplicate of [Django annotating with a first element of a related queryset](http://stackoverflow.com/questions/23629431/django-annotating-with-a-first-element-of-a-related-queryset), although there is no good answer there. Also have a look at [this blog post](http://blog.roseman.org.uk/2010/08/14/getting-related-item-aggregate/). It's not easy to do, basically. – solarissmoke Jun 30 '16 at 02:34
  • @solarissmoke please see my answer below, I think it is better than the one in the blog you referred. It does not fall back to raw sql even partially. – mehmet Jun 30 '16 at 19:20
  • There might not have been a good answer at the time you wrote this comment, but https://stackoverflow.com/a/60636238/188 is the best way to achieve this (although it may require a bit more than just that). https://schinckel.net/2019/07/30/subquery-and-subclasses/ is probably a better resource (if I do say so myself). – Matthew Schinckel Jul 02 '20 at 03:32

3 Answers3

1

How about (given your queryset of uw): [obj.words.first() for obj in uw]

Patrick Beeson
  • 1,667
  • 21
  • 36
1

Let:

uw # be a given queryset of UserWord's
dw # be a queryset of DistinctWords (will be derived from `uw`)
w  # be a queryset of Words needed (will be derived from `dw`)

Each UserWord has a DistinctWord, and each DistinctWord has many Word's (loosely denoted as uw>dw<w).

Here is my answer:

dw_id=uw.values_list('distinct_word_id', flat=True) # 1: get dw ids from uw
dw=DistinctWord.objects.filter(id__in=dw_id)        # 2: get dw's
w_first_id=dw.annotate(first_word=Min('words')).values_list('first_word', flat=True) 
                                                    # 3: find id of first word 
w=Word.objects.filter(id__in=w_first_id)            # 4: get first words

In summary: lines 1 and 2 get dw and should be just 1 trip to the database

line 3 uses annotate followed by values_list to find the id of first related Word

Line 4 brings the actual Word objects from the id's generated in the previous step. Lines 3 and 4 should be another trip to the database since annotate is not a terminal statement.

Thus 2 trips to the database (not tested).

mehmet
  • 7,720
  • 5
  • 42
  • 48
1

You can do this using the Subquery API:

from django.db.models.expressions import Subquery, OuterRef

first_word = Word.objects.filter(
    distinct_word=OuterRef('distinct_word')
).order_by('pk').values('pk')[:1]

UserWord.objects.filter(
     # whatever filters...
).annotate(
     first_word=Subquery(first_word)
)

This will result in SQL that looks something like:

SELECT user_word.*,
       (SELECT word.id 
          FROM word 
         WHERE word.distinct_word_id = user_word.distinct_word_id
       ) AS first_word
  FROM user_word
 WHERE ...

This will probably not perform as well as a JOIN with a DISTINCT ON in postgres, and may not perform as well as a JOIN with a GROUP BY, as it will need to execute the subquery for each row.

Matthew Schinckel
  • 35,041
  • 6
  • 86
  • 121