0

If I have:

class Info(Model):
    ...

class Ad(Model):
    listed_date = DatetimeField()
    info = ForeignKey('Info', related_name='ads', null=True)
    ....

I want to query Info based on fields within Ad, but only the latest ad. I know I can do:

Ad.objects.latest('listed_date')

But since I will be building up the query by chaining several filter/excludes together, I want something like:

query = query.filter(
    Q(**{
        'ads__latest__'+attr_name: query_value
    })
)

Or perhaps even have a field 'latest_ad' which always points to the most recent based on a certain field. The goal is to be able to query just the latest in the related field in a built up filter/exclude method.

How can I do this?

EDIT:

A little background... I have 2 models (LegalAd, TrusteeInfo) that store scraped data about the same auction item, some of the field need a fair deal of processing to extract the necessary values (hence my decision to store the information in separate models) store the data at different stages of processing. I then attempt to combine both models into one (AuctionItem), and use properties extensively to prioritze data from TrusteeInfo over LegalAd for the similar fields they share. The problem is that I would like to query those fields, which the use of properties prohibits. So I created a manager and overrode the filter and exclude methods to hold the prioritization logic. Below is the code:

class LegalAd(models.Model):
    listed_date = models.DateField(null=True)  # field I would like to use for latest query
    auction = models.ForeignKey('auction_table.Auction', related_name='legal_ads', null=True)
    ...


class TrusteeInfo(models.Model):
    auction = models.OneToOneField('auction_table.Auction', null=True)
    ...


class AuctionManager(models.Manager):
    def do_query_action(self, action, kwargs):
        trusteeinfo = apps.get_model('scrapers', 'TrusteeInfo')
        trustee_fields = [field.name for field in trusteeinfo._meta.get_fields()]
        legalad = apps.get_model('scrapers', 'LegalAd')
        legalad_fields = [field.name for field in legalad._meta.get_fields()]
        related_fields = trustee_fields + legalad_fields
        auction_native_fields = [
            'legal_ads',
            'trusteeinfo',
            'properties',
            'id',
            'pk',
            'created_date',
            'updated_date'
        ]
        query = super(AuctionManager, self)
        for attr, value in kwargs.items():
            attr_base = attr.split('__')[0]  # get the base attr name
            if attr_base in auction_native_fields:
                query = getattr(query, action)(**{attr: value})
            elif attr_base in related_fields:
                qs = []
                if attr_base in trustee_fields:
                    trustee_attr_name = 'trusteeinfo__' + attr
                    qs.append(Q(**{trustee_attr_name: value}))
                if attr_base in legalad_fields:
                    legalad_attr_name = 'legalads__' + attr
                    qs.append(Q(**{legalad_attr_name: value}))
                query = getattr(query, action)(reduce(or_, qs))
            else:
                raise AttributeError("type object `Auction` has no attribute '{attr}'".format(attr=attr))
        return query.distinct()

    def filter(self, **kwargs):
        return self.do_query_action('filter', kwargs)

    def exclude(self, **kwargs):
        return self.do_query_action('exclude', kwargs)


class Auction(models.Model):
    objects = AuctionManager()
    created_date = models.DateTimeField(auto_now_add=True)
    updated_date = models.DateTimeField(auto_now=True)

    @property
    def latest_ad(self):
        return self.legal_ads.exists() and self.legal_ads.latest('listed_date')

    @property
    def sale_datetime(self):
        if self.trusteeinfo and self.trusteeinfo.sale_datetime:
            return self.trusteeinfo.sale_datetime
        else:
            return self.latest_ad and self.latest_ad.sale_datetime

    @property
    def county(self):
        if self.trusteeinfo and self.trusteeinfo.county:
            return self.trusteeinfo.county
        else:
            return self.latest_ad and self.latest_ad.county

    @property
    def sale_location(self):
        return self.latest_ad and self.latest_ad.sale_address

    @property
    def property_addresses(self):
        if self.trusteeinfo and self.trusteeinfo.parsed_addresses.exists():
            return self.trusteeinfo.parsed_addresses
        else:
            return self.latest_ad and self.latest_ad.parsed_addresses

    @property
    def raw_addresses(self):
        if self.trusteeinfo and self.trusteeinfo.addresses:
            return self.trusteeinfo.addresses
        else:
            return self.latest_ad and self.latest_ad.addresses.get('addresses', None)

    @property
    def parcel_numbers(self):
        return self.latest_ad and self.latest_ad.parcel_numbers

    @property
    def trustee(self):
        if self.trusteeinfo:
            return self.trusteeinfo.trustee
        else:
            return self.latest_ad and self.latest_ad.trustee.get('trustee', None)

    @property
    def opening_bid(self):
        if self.trusteeinfo and self.trusteeinfo.opening_bid:
            return self.trusteeinfo.opening_bid
        else:
            return self.latest_ad and self.latest_ad.dollar_amounts.get('bid_owed', [[None]])[0][0]

    @property
    def deposit_amount(self):
        if self.trusteeinfo and self.trusteeinfo.deposit_amount:
            return self.trusteeinfo.deposit_amount
        else:
            return self.latest_ad and self.latest_ad.dollar_amounts.get('deposit', [[None]])[0][0]

    @property
    def sale_status(self):
        return self.trusteeinfo and self.trusteeinfo.sale_status

    @property
    def trustors(self):
        if self.trusteeinfo and self.trusteeinfo.parsed_names.exists():
            return self.trusteeinfo.parsed_names
        else:
            return self.latest_ad and self.latest_ad.parsed_names

It gets a bit more complicated with the fact that the ads are usually listed 2 at a time so there is a good chance of 2 ads showing up for the latest date, meaning I would have to run something like a first() method on it too. I could look out for certain kwargs and run a special query for that but how would I incorporate that into the the rest of the kwargs in the chained query? Ideally, if I could keep the one to many legal_ads, but also be able to do something like:

query.filter(latest_ad__<queryfield>=value)

or:

query.filter(legal_ads__latest__<queryfield>=value)

That would be great.

Verbal_Kint
  • 1,366
  • 3
  • 19
  • 35

2 Answers2

2

What you have is the so called greatest-n-per-group problem, its hard to deal with or even impossible with the ORM.

One way to approach the problem can be found here.

In your case it could be something like this:

Info.objects.filter(
    ad__listed_date__in=Info.objects.annotate(
            last_date=Max('ad__listed_date')
        ).values_list('last_date', flat=True)
    #now you can add more
    #ad__<somefiled> statements
    #but you need to make it in a single `.filter` call
    #otherwise the ORM will do separate joins per `.filter` call
)

I personally don't like this. It's looks like a hack to me, its not very efficient and it can very easy return bad results if a penultimate ad in some group have an equal listed_date to a last ad in another group.

Workarounds

If you give us some more background about why do you need to filter on the latest_ad per info, maybe we could find another way to get the same/similar results.

However, one workaround which I prefer, is to filter on some date_range. For example, don't search for the latest_ad, but .filter on the latest_ads in last_day or two or a week, depending on your needs. Its pretty easy and efficient (easy to optimize) query.

Info.objects.filter(
    ad__listed_date__gte=(today-timedelta(days=1))
    #now again you can keep adding more `ad__<somefiled>` statements
    #but make sure to enclose them in a single `.filter` call.
)

You also mention a good workaround, if you can easy keep up-to-date Info.latest_ad field, then I guess you will be good to go.

If you go for that approach make sure to set on_delete=models.SET_NULL because the default behavior (cascade delete) can bring you problems.

class Info(Model):
    #...
    
    #related_name='+' prevents creating a reverse relation
    #on_delete=models.SET_NULL prevents deleting the Info object when its latest ad got deleted.
    latest_ad = models.ForeignKey('Ad', 
        related_name='+',
        blank=True,
        null=True,
        on_delete=models.SET_NULL
    )
Community
  • 1
  • 1
Todor
  • 15,307
  • 5
  • 55
  • 62
  • 1
    @Verbal_Kint I vote for the approach of adding `Info.latest_ad` field and keep it up to date. I think its the best fit in your situation. – Todor Aug 05 '16 at 06:33
  • I think that's the route I'm going to take. However, would it be best to make it a one-to-one field? I was thinking about adding a method to the model that would be run every time I add new ads that would set the one-to-one with the latest ad. Although if there were a way to have that run automatically anytime new ads are added to the legal_ads one-to-many relationship that would be ideal... – Verbal_Kint Aug 05 '16 at 16:52
  • Are there drawbacks to have two relationships between the 2 models (one-to-many and one-to-one)? – Verbal_Kint Aug 05 '16 at 16:53
  • 1
    No there are no problem to have two relationships. About the One-to-One field, i think FK would be better, because you are restricting an ad to be last for only one info, maybe there could be a situation where the same ad will be last for two info models? About the function that can populate the last ad, you can check the django [signals](https://docs.djangoproject.com/en/1.9/topics/signals/) framework, where you can listen for a new add event and calculate the latest. But It also depends on how you add ads, if you use bulk_create, then no signals will be fired and you need to do it manually. – Todor Aug 05 '16 at 17:03
  • I'm not sure I understand the reasoning behind using `ForeignKey` instead of `OneToOneField` for the field `latest_ad`. I thought the point was to isolate 1 object so I can query it without the obstacle of the greatest-n-per-group problem you had mentioned earlier. Wouldn't creating a second relation which returns multiple objects just duplicate the problem? Thanks for your replies, they have been helpful. – Verbal_Kint Aug 05 '16 at 18:57
  • Sorry, I misunderstood the example, the relationship is being set on the `Info` side and you eliminated the reverse relationship with the `related_name='+'` argument. But, out of curiosity, how would that differ from a `OneToOneField`? – Verbal_Kint Aug 05 '16 at 19:08
  • 1
    `OneToOneField` is a `ForeignKey` + `unique=True`, which means `Info` object will always have only one `latest_ad`, the question is: should we allow an `ad` with `id=1` to be the `latest_ad` of `info` object with `id=1` and `info` object with `id=2` at the same time, or we should restrict that? `OneToOne` field restrict it (with `unique=True`), `ForeignKey` allows it (one `ad` can be latest to several `info` objects, but again every `info` object will have only one `lastest_ad`, its just shared across several `info` objects). Huh, did i make it more clear now? :D – Todor Aug 05 '16 at 19:16
0

You can use .latest() along .filter()

Ad.objects.filter(your_filter=your_value).latest('listed_date')

or using oder_by

Ad.objects.filter(your_filter=your_value).order_by('-listed_date')[0]
levi
  • 22,001
  • 7
  • 73
  • 74
  • There's no guarantee the one to many field I want to take the latest one from will be in the query. I am building it with many chained filter/exclude function in the manager by overriding the filter/exclude methods – Verbal_Kint Aug 05 '16 at 01:20