0

I'm trying to the most likely predicted category for a datapoint. Since code is the best explanation:

models:

class DataPoint(models.Model):
    #... unimportant fields

class PredResult(models.Model):
    likelihood = models.FloatField()
    value = models.IntegerField()
    data_point = models.ForeignKey(DataPoint)

For each DataPoint object I am trying to find the value for the PredResult with the highest likelihood. Currently I'm using a for-loop:

data_points = DataPoints.objects.select_related('predresult')
for dp in data_points:
    if dp.predresult_set.all().exists():
        val = dp.predresult_set.order_by('-likelihood')[0].value
        #do other stuff here with val and dp

I'd like to get add a best_value field to the DataPoint queryset. Currently there are ~5 PredResult objects per DataPoint and ~20,000 DataPoints (although this may balloon rapidly). However, this for-loop takes too long to complete within a view.

Can anyone suggest a way to deal with this? Either a Django ORM trick, a extra() method on the Queryset. Or do you think I should use a post-save method on the PredResult object and update a field on the DataPoint object directly?

If its needed I'm using MySQL as the database backend.

JudoWill
  • 4,741
  • 2
  • 36
  • 48

1 Answers1

0

Aggregation:

from django.db.models import Max
values = DataPoint.objects.annotate(max_result=Max('predresult__value'))

Now each element in values has a max_result attribute containing the max related result.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895