Running a Window function on a set of Max'd values

Question

So, I have an object Trainer which has reverse relations to many Survey objects. Each Survey object has a load of Integer fields and represents stats at a certain point in time. Not every Survey object will have every field filled in so I'm using django.db.models.Max() to get the latest values (values can never go down).

I am then trying to compare those values to everybody else in the all the other Trainer objects in the database with django.db.models.functions.windows.PercentRank() to get their percentile.

This is what I have and it runs fine up until Window Expression - Calculate Percentile after which I get an error!

from django.db.models import Max, Window, F
from django.db.models.functions.window import PercentRank

from survey.models import Survey, Trainer

fp_default_fields = ['badge_travel_km', 'badge_capture_total', 'badge_evolved_total', 'badge_hatched_total', 'badge_pokestops_visited', 'badge_big_magikarp', 'badge_battle_attack_won', 'badge_small_rattata', 'badge_pikachu', 'badge_legendary_battle_won', 'badge_berries_fed', 'badge_hours_defended', 'badge_raid_battle_won', 'gymbadges_gold', 'badge_challenge_quests', 'badge_max_level_friends', 'badge_trading', 'badge_trading_distance']

def calculate_foo_points(survey: Survey, fields: str=fp_default_fields, top_x: int=10):
    '''
    Calculates a Trainer's Foo Points at the time of Surveying
    '''

# Base Query - All Trainers valid BEFORE the date of calculation
query = Trainer.objects.filter(survey__update_time__lte=survey.update_time)
# Modify Query - Exclude Spoofers
query = query.exclude(account_falsify_location_spawns=True,account_falsify_location_gyms=True,account_falsify_location_raids=True,account_falsify_location_level_up=True)
# Extend Query - Get Max'd Values
query = query.annotate(**{x:Max(f'survey__{x}') for x in fields})
# Window Expression - Calculate Percentile
query = query.annotate(**{f'{x}_percentile':Window(expression=PercentRank(x), order_by=F(x).asc()) for x in fields})
# Delay the fields we don't need
query = query.only('user__id')
# Get Trainer
trainer = [x for x in query if x.pk == survey.trainer.pk]
# Get 10* most highest ranked fields
top_x_result = sorted([getattr(trainer, x) for x in fields])[:top_x]
# Average together fields
result = sum(top_x_result, top_x)
return result

Error:

Traceback (most recent call last):
  File "/mnt/sshd/Gits/tl40/env/lib/python3.7/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
psycopg2.ProgrammingError: WITHIN GROUP is required for ordered-set aggregate percent_rank
LINE 1: ...e_trading_distance") AS "badge_trading_distance", PERCENT_RA...
                                                             ^

If anyone is able to explain what this means or how to get around it, that would be great :)

Thank you!

score 2 · Accepted Answer · answered Oct 26 '18 at 15:51

2

The issue was that the function PercentRank doesn't take any arguments. Django documentation wasn't very clear.

answered Oct 26 '18 at 15:51

JayTurnr

157
1
11

Running a Window function on a set of Max'd values

1 Answers1