2

I'm working with django-eztables to do the server-side processing for a datatable, and I want to add some fields involving aggregation (as mentioned here)

The following each work fine individually:

def get_queryset(self):
qs = super(SomeObjectDataTableView, self).get_queryset()
return qs.select_related().annotate(items_count=Count('items'))

And

def get_queryset(self):
qs = super(SomeObjectDataTableView, self).get_queryset()
return qs.select_related().annotate(total_sum=Sum('anotherobject__differentobject__total'))

but if I try to do both, in either order, in the same annotate() or chained one after the other, they both end up as the same number, which is the product of what they each should be.

If I add distinct=True to the Count, it produces the correct value, but it still makes the Sum give an inflated value. (Again, changing the order doesn't help, and Sum doesn't take a distinct parameter)

I've seen a couple of similar questions on SO, but most seemed to deal with multiple counts, which can be resolved using distinct=True. There was one that had a Sum, but the solution involved using extra() and some handcrafted SQL, which I've been so far unable to adapt to work with all the foreign key traversal I need to do (I've used SQL a bit, but I'm by no means an expert). Here's the basic setup of the relevant models, in case extra() is the only viable solution:

  • Item has a foreignkey to SomeObject
  • AnotherObject has a foreignkey to DifferentObject and a foreignkey to SomeObject

If anyone has any idea how I can get around these problems and get both annotations on my queryset, it would be much appreciated.

Community
  • 1
  • 1
StephenTG
  • 2,579
  • 6
  • 26
  • 36

1 Answers1

1

Django is opaque about how relationships are translated into SQL... and the SQL usually explains the underlying issue in the Django output. Take a data structure like:

Adult Table (Name)

  • Bob
  • ...

Child Table (Parent, Child)

  • Bob, John
  • Bob, Stacy
  • ...

Pet Table (Owner, Pet)

  • Bob, Cat
  • Bob, Dog

Lets say you want a table with

  • Name
  • # Children
  • # Pets

Annotations are achieved using JOINs. If you try to annotate with two counts, Django will run the following query (pseudo SQL in case readers don't know SQL):

SELECT Adult's Name, Count of Children, Count of Pets 
FROM Adult 
JOIN Child (where the adult is the parent) 
JOIN Pet (where the adult is the owner)

Unfortunately, this will result in the following rows for Bob:

Bob, John, Cat
Bob, John, Dog
Bob, Stacy, Cat
Bob, Stacy, Dog

If you simply count Child and Pet, you'll get the product of the two counts (4 = 2 x 2). In the examples you've seen, this is resolved by including the DISTINCT value in each count to eliminate the duplicates.

In your case, instead of distinctly named object like pets, you're collecting totals from a related object:

Bob, John, 250
Bob, John, 100
Bob, Stacy, 250
Bob, Stacy, 100

Even if you could get DISTINCT working in a simple case, it would be extremely dangerous because totals on two related objects could be exactly the same. A DISTINCT call would eliminate one of the two values even though both are valid. Technically, this approach could ALSO break with a named value. For example, if you wanted to count the number of pets on a street, DISTINCT could provide the wrong answer if two of those pets had the same name.

Achieving the desired outcome directly is hard in Django because it's hard in SQL:

  • The "simplest" approach is to run the full query but divide the SUM by the COUNT when you use the value. You can see why that works in the example above... since the entire total will be listed for each instance of the other counted value.
  • How to achieve this depends on your situation:
    • I think you want to use extra(). The answer to this question shows you how to combine two aggregate values using extra() and no raw SQL. I presume (but have not tested) that you could just as easily divide a sum by a count.
    • You could manually iterate through values and run the second query. Since you're using Datatables, I presume this is infeasible.
    • Depending on your Datatable framework, you might also be able to do this when defining or displaying the Datatable.

This may not be exhaustive, but the first is pretty strait-forward so I didn't bother thinking up others.

claytond
  • 1,061
  • 9
  • 22