0

When I query my database I get the same entry three times with three different counts. In the result list I would like to get just one entry showing the sum of these counts.

What I would like to display is the total for a specific period.

In my query I enter the dates (from - until), a metric_type, a data_type and a year of publication.

For instance I on the website when I enter

from 201903 until 201905 

I retrieve three titles that have been accessed in this period as well as how many times they have been accessed.

Title, Publisher,    DOI,   data_type,YOP,[some other stuff] Counts

Title A  publisherA   1234    Article, 2006, [some_other_stuff], 3
Title A  publisherA   1234    Article, 2006, [some_other_stuff], 5
Title A  publisherA   1234    Article, 2006, [some_other_stuff], 3

What I would need is something like

Title, publisher, DOI,   data_type, YOP,  [some_other_stuff],  Total period

Title A, publisherA 1234   Article, 2006,   [some_other_stuff],    11


In views.py I have the following code:

    q_report = Q()


        var0 = self.request.GET.get("period1", "")
        var1 = self.request.GET.get("period2", "")
        var2 = self.request.GET.get("metric_type", "")
        var3 = self.request.GET.get("data_type", "")
        var4 = self.request.GET.get("YOP", "")



        if var0:
            q_report = q_report & (Q(month__gte=var0) & Q(month__lte=var1) )
            #q_report_count = q_report_count & (Count(month__gte=var0) & Count(month__lte=var1) )
        if var2:
            q_report = q_report & (Q(metric_type=var2))
            #q_report_count = q_report_count & (Q(metric_type=var2))
        if var3:
             q_report = q_report & (Q(data_type=var3))
        if var4:
             q_report = q_report & (Q(YOP=var4))

If I use

qs = self.model.objects.filter(q_report).select_related()

I will get back the three entries

The result is

<QuerySet [<Model: Title A>, <Model: Title A>, <Model: Title A>]>

I can easily calculate the total by using

total = self.model.objects.aggregate(total=Sum('counts', filter=q_report))

The result is

{total : 11.0}

But ideally I need to get the other values as well

Right now, I cannot see how I can get the two together, i.e. something like


<QuerySet [<Model: {'title':Title A, 'total':11}>, Model: {'title':Title B, 'total':7}>]

I don't even know whether this is possible or desired. But I need to return the queryset and the total.

Can anyone help? Thanks.

Py_Bear
  • 23
  • 5

2 Answers2

1

This is in essence a modeling problem. Instead of defining a model like:

# modeling with data duplication

class SomeModel(models.Model):
    title = models.CharField(max_length=256)
    publisher = models.CharField(max_length=256)
    doi = models.CharField(max_length=256)
    publication_type = models.CharField(max_length=128)
    year = models.IntegerField()
    date = models.DateField()
    counts = models.IntegerField()

You should make a model like Publication, and link the SomeModel to that Publication, like:

# modeling without data duplication

class Publication(models.Model):
    title = models.CharField(max_length=256)
    publisher = models.CharField(max_length=256)
    doi = models.CharField(max_length=256)
    publication_type = models.CharField(max_length=128)
    year = models.IntegerField()

class PublicationSale(models.Model):
    publication = models.ForeignKey(Publication, on_delete=models.CASCADE)
    date = models.DateField()
    counts = models.IntegerField()

This will reduce the database size, will make it less likely that your database has inconsistencies (like if you change the year of a publication that some records are not updated properly), and it might make the database faster for some queries.

In that case you can query like:

from django.db.models import Sum

Publication.objects.annotate(
    total=Sum('publication_sale__counts')
)

You then retrieve a QuerySet with Publication objects that carry an extra attribute .totals that is the sum of the counts of the related PublicationSale objects.

If you do not do this remodeling, you can work with:

from django.db.models import Sum

qs = self.model.objects.filter(q_report).values(
    'title', 'publisher', 'doi', 'publication_type', 'year'
).annotate(
    total=Sum('counts')
).order_by(
    'title', 'publisher', 'doi', 'publication_type', 'year'
)

which then results in a QuerySet of dictionaries:

<QuerySet [
    {'title': 'Title A', 'publisher': 'pubA', ..., 'total': 11},
    {'title': 'Title B', 'publisher': 'pubA', ..., 'total': 7},
    {'title': 'Title A', 'publisher': 'pubB', ..., 'total': 3},
]>

But as you can see the query does not yield models, and it is a bit "unstable" in the sense that each time you add extra data of your "functionally related entity", you will need to update the query.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Thank you, that's useful, but I really need a queryset that looks like this: ``` ,, ] {'title': 'Title B', 'total': 7}, {'title': 'Title C', 'total': 3}, ]> ``` If the model name is not in there, only the title will be displayed, not everything else. If I leave out .values('title'), I get the same query set as before... – Py_Bear May 23 '19 at 10:32
  • @PyBear: but since these are (if I understood it correctly), different models, what model do you want to return? – Willem Van Onsem May 23 '19 at 10:36
  • Oh no, it's the same model, actually. The things is, if I use your solution in my result list only the title is displayed and none of the other meta data which are supposed to be there, like publisher, DOI, ISBN, data_type etc. Sorry - I simplified to much above, so this part didn't get across correctly. I'll try and edit the original question. – Py_Bear May 23 '19 at 10:44
  • @Py_Bear: well if these are separate models, you can annotate like `MyModel.filter(..).annotate(total=Sum('counts'))`, then each `MyModel` will have an attribute `.total`, but this will then always be a multiple of the `Counts` of that record evidently. – Willem Van Onsem May 23 '19 at 10:46
  • @BearBrown: because otherwise the query will not perform a `GROUP BY title`. – Willem Van Onsem May 23 '19 at 10:46
  • @Py_Bear: this really looks like "bad modeling": your rows repeat all kinds of data, and therefore grouping is hard. You should make a different model, like `Publication`, and use a `ForeignKey` to avoid duplication, in that case, annotating is simple, and the database will typically be smaller and faster. – Willem Van Onsem May 23 '19 at 10:51
0

Replace your query by

self.model.objects.filter(q_report).annotate(total=Sum('counts')).values('title','total')

Where total would be the field you want to sum