Check if object_id occurs more than once in queryset.annotate Case When parameter

Question

Documentation field look up doesn't really help in my case

What my query looks like now

date_delta = 2

queryset = TrendData.objects.filter(owner__trend_type__mnemonic='posts', 
 date_trend__date__range=[date_from, date_to]).values('owner_id', 'owner__name')

queryset.annotate(owner_name=F('owner_id__name')).values('owner_name', 'owner_id').annotate(
    views = Sum(Case(When(owner_id__gt=1, then=F('views') / date_delta)), default=('views')...,
                output_field=IntegerField() )
)

the queryset output looks like this:

{'owner_id': 1306, 'owner__name': 'Some name123'}, 
{'owner_id': 1307, 'owner__name': 'Somename as well'}, 
{'owner_id': 1308, 'owner__name': 'aand another name'}, 
{'owner_id': 1306, 'owner__name': 'Some name123'}

as you can see there are matching owner_id's and the queryset len() is 100k per day, so if range of dates is 5 days queryset len() == 500k. my models.py look like this

class Owner(models.Model):
    class Meta:
        verbose_name_plural = 'Objects'

    TREND_OWNERS = Choices('group', 'user')

    link = models.CharField(max_length=255)
    name = models.CharField(max_length=255)
    owner_type = models.CharField(choices=TREND_OWNERS, max_length=50)
    trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)

    def __str__(self):
        return f'{self.link}[{self.trend_type}]'


class TrendData(models.Model):
    class Meta:
        verbose_name_plural = 'Trends'

    owner = models.ForeignKey(Owner, on_delete=models.CASCADE)
    views = models.IntegerField()
    views_u = models.IntegerField()
    likes = models.IntegerField()
    shares = models.IntegerField()
    interaction_rate = models.DecimalField(max_digits=20, decimal_places=10)
    mean_age = models.IntegerField()
    date_trend = models.DateTimeField()

I realised that it will work fine, but it will be wrong, since if owner_id is great it will divide by date_delta, where in my case I want if owner_id occurence in queryset more than once. I have tried owner_id__count__gt but that doesnt exist :(

I would love to know if there is a way to count owner_id occurence in my annotate Case(When()) queryset. that will literally solve my problem. if it's greater than 1 than we divide by date_delta, else we leave it as it is

Update:

Just to be clear, this annotation does an excellent job, however it also divides some queries that i don't want to be divided (in my case NON duplicate owner_id queryset still divides it's views, shares etc by 2) so that is why I use Case(When()) mentioned above

queryset.values('owner__name', 'owner_id').annotate(
    views=Sum('views') / 2, 
    views_u=Sum('views_u') / 2, 
    likes=Sum('likes') / 2,
    shares=Sum('shares') / 2, 
    interaction_rate=Sum('interaction_rate') / 2,
    mean_age=Sum('mean_age') / 2)

UPDATE #2 This is my logic but in python

json_output = []
for item in (queryset
                .values('owner__name', 'owner_id')
                .annotate(owner_count=Count('owner_id'))
                .annotate(views=Sum('views'), views_u=Sum('views_u'),
                            likes=Sum('likes'),
                            shares=Sum('shares'),
                            interaction_rate=Sum('interaction_rate'),
                            mean_age=Sum('mean_age')):
    if item['owner_count'] > 1:
        item['views'] = item['views'] / date_delta
        item['views_u'] = item['views_u'] / date_delta
        item['likes'] = item['likes'] / date_delta
        item['shares'] = item['shares'] / date_delta
        item['interaction_rate'] = '{:.10f}'.format(
            Decimal(item['interaction_rate']) / date_delta)
        item['mean_age'] = item['mean_age'] / date_delta
        json_output.append(item)
    else:
        json_output.append(item)

What is the end goal of this query? owners with more than one trend data? — Sayse, Apr 17 '19 at 16:27
@Sayse the end goal is to sum all the views of trenddata.objects and if there are more than 1 owner in those objects divide it by date_delta(2 in our case) if not just leave the sum as it is — nexla, Apr 17 '19 at 16:57
There never will be more than one owner though as it is a foreign key rather than a many to many relationship — Sayse, Apr 17 '19 at 17:31
My queryset (TrendData) is range of dates example : ```)}, {'id': 275370, 'owner_id': 155116, 'views': 15280, 'views_u': 13351, 'likes': 160, 'shares': 10, 'interaction_rate': Decimal('0.012733128900000'), 'mean_age': 32, 'source_id': 3, 'date_trend': datetime.datetime(2019, 4, 5, 0, 0, tzinfo=)}``` there are more than 2 owners in 1 queryset — nexla, Apr 17 '19 at 17:35
So you're looking for an aggregation of views rather than an annotation? — Sayse, Apr 17 '19 at 17:42
@Sayse nope, nnotation, aggregation will give me just a sum of all views etc annotation will give me for each of the objects that have common owners ( IF i m not mistaken tho) — nexla, Apr 17 '19 at 20:07
@Sayse I have updated my question once again with an example of an 'almost' working code — nexla, Apr 18 '19 at 09:35

Endre Both · Accepted Answer · 2019-04-18T22:07:27.523

Update: Turns out that I hadn't tested this fully after all (I thought I had, apologies). You need to have the Case wrapped around Sum, the other way around (Sum around Case) won't work no matter the Django version:

(queryset
    .values('owner', owner_name=F('owner__name'))
    .annotate(owner_count=Count('owner'))
    .annotate(views = Case(
        When(owner_count__gt=1,
             then=Sum(F('views') / date_delta)),
        default=Sum('views'),
        output_field=IntegerField()
    ))
)

A slight variation would be to use a subquery. Raydel's subquery that calculates the Trenddata count for every Owner works in principle, but will be prohibitively slow as it does an aggregation for every single row in Trenddata (not just for unique Owners).

A different subquery provides a faster way of getting the same result. It does the heavy lifting of counting Owners in Trenddata only once and then checks for every Trenddata object if its owner is in the list. I would think this should still be slower than my first query, but strangely enough, it came out on par in my short tests (with around 3m rows).

(queryset
    .values('owner', owner_name=F('owner__name'))
    .annotate(multi=Case(
        When(owner__in=Subquery(TrendData.objects
                                    .values('owner')
                                    .annotate(cnt=Count('owner'))
                                    .filter(cnt__gt=0)
                                    .values('owner')), 
             then=1),
        default=0,
        output_field=IntegerField())
    ) 
    .annotate(views = Case(
        When(multi=1,
             then=Sum(F('views') / date_delta)),
        default=Sum('views'),
        output_field=IntegerField())
    )
)

You can wrap the two annotations in one, but if you're reusing multi for several more annotations rather than just one as in my example, separating the two saves you from repeating the subquery for every annotation.

I've got same error as i mentioned to @Raydel comment above :( — nexla, Apr 17 '19 at 14:31
hmm i tried once again it doesnt :( but i guess i should mentioned that my django is 1.11 :/ — nexla, Apr 17 '19 at 14:45
I think the error was on my part, not due to Django versions. Take the above two queries for a spin if you like. — Endre Both, Apr 18 '19 at 21:57

Raydel Miranda · Answer 2 · 2019-04-18T21:09:31.620

1

First, I think this is wrong owner_name=F('owner_id__name' it hsould be owner_name=F('owner__name'.

If I understood, you want to annotate TrendData queryset with the amount of TrendData instances that have the owner.

You can use a Subquery to achieving that:

owner_td_count = Owner.objects.annotate(
    td_count=Count('trenddata_set')
).filter(
    id=OuterRef('owner_id')
).values('td_count')[:1]

Then annotate first by counting occurrences of owner_id:

queryset.annotate(
    owner_name=F('owner__name'),
    owner_id_count=Subquery(owner_td_count)   # How many DataTrend's have the owner with id=owner_id
    ).values('owner_name', 'owner_id').annotate(
        # ...
    )
)

Then you could in you Case/when construction:

Case(
    When(
        owner_id_count=1, then=F('views'), 
        default=F('views') / date_delta)),
        output_field=IntegerField() 
    )
)

edited Apr 18 '19 at 21:09

answered Apr 17 '19 at 14:07

Raydel Miranda

13,825
3
38
60

this looks like it, however im getting an error :( ```django.core.exceptions.FieldError: Cannot compute Sum(' THEN , ELSE Value(None)>'): ' THEN , ELSE Value(None)>' is an aggregate``` – nexla Apr 17 '19 at 14:30
Perhaps the problem was that aggregates do no support lookups inside `When` expressions, I edit the answer in order to avoid the use of `__gt` lookup on the `owner_id_count` aggregate inside the When expression. – Raydel Miranda Apr 17 '19 at 14:43
You need to select some (non-unique) `values` *before* counting, otherwise the count will always be `1`. – Endre Both Apr 17 '19 at 14:45
@EndreBoth No, you don't. `owner_id` could be a result of having a field `owner` which is a foreign key to another model. And the foreign key (can be but ...) is not guaranteed to be unique across the table. – Raydel Miranda Apr 17 '19 at 14:50
But without `values`, you're grouping not only on `owner_id` (and `owner_name` which is neither here nor there), but also on the `id` of the model you're querying (here `TrendData`), leading to groups of one. – Endre Both Apr 17 '19 at 14:54
still getting this error :( Im doing this ```queryset.annotate( owner_name=F('owner__name'), owner_id_count=Count('owner_id') ).values('owner_name', 'owner_id').annotate( views = Sum(Case(When(owner_id_count=1, then=F('views'), views=F('views') / date_delta)), output_field=IntegerField()), ... ``` – nexla Apr 17 '19 at 15:44
Not just that have the owner, but that queryset has more than 1 of the same owner. Example : Queryset(owner_id : 1, views:10, likes:10... , owner_id : 2...,owner_id : 1..) And if queryset has more than 1 of the same owner_id's (in this example there are 2 owners with id of 1 ) we divide by date_delta, else we just use default value from queryset. Sorry if my question is missleading, English is my 2nd language :( – nexla Apr 17 '19 at 17:03
1

Yes, then the updated answer is the solution, give it a try. Please take into account I've not tested this, this is an effort for pointing out you in the right direction. – Raydel Miranda Apr 17 '19 at 17:04
still an error, changed ```'trenddata_set'``` to either trenda_data__owner_id and tried just ```trenddata```, both displaying same error :( On ```trenddata_set``` he can't resolve a key for a field tried double underscore and got ```'Unsupported lookup 'set' for AutoField or join on the field not permitted'``` – nexla Apr 17 '19 at 17:27
It's `trenddata`, not `trenddata_set` (and `owne*r*_td_count`). Other than that, it should work fine. – Endre Both Apr 18 '19 at 18:54

Check if object_id occurs more than once in queryset.annotate Case When parameter

2 Answers2