2

I'm currently preparing some logged items into a JSON serialized format.

I am attempting to do this via Django's built-in ORM, utilising annotations and aggregations.

I'm trying to replicate the following structure per each of these "logged items":

...
{
    "count": 20,
    "metric-type": "total-dataset-requests",
    "access-method": "regular",
    "country-counts": {
        "au": 6,
        "jp": 1,
        "us": 13
    }
}
...

I've currently built this queryset from my own knowledge:

metrics = LoggedItem.objects.filter(
    identifier=record['identifier'],
    hit_type='investigations',
    is_machine=False,
    is_robot=False
).values(
    'identifier', 'country'
).annotate(
    count=models.Count('identifier'),
    metric_type=models.Value("total-dataset-requests", output_field=models.CharField()),
    access_method=models.Value("regular", output_field=models.CharField())
)

This gives me a <Queryset []> as follows:

<QuerySet [{'identifier': '10.80519/954e-4db4', 'country': 'fr', 'count': 1, 'metric_type': 'total-dataset-requests', 'access_method': 'regular'}, {'identifier': '10.80519/954e-4db4', 'country': 'gb', 'count': 5, 'metric_type': 'total-dataset-requests', 'access_method': 'regular'}]>

As you can see, I have all of the data I need for mimicking the above data stucture. But in a sightly obscure format ... I can go in a clean this up with Python, from a values_list() or iterator(), but I'd like to do most of the heavy lifting on the database layer via the ORM.

So I guess my question is, how could I replicate the kid of aggregation as shown in the JSON structure as an ORM query...?

FYI: It would likely be useful to know The full LogItem from models.py:

class LogItem(models.Model):
    ....

    id = models.AutoField(_('ID'), primary_key=True)

    session_id = models.TextField(_('Session ID'), default='')

    doubleclick_id = models.TextField(_('Doubleclick ID'), default='')

    event = models.DateTimeField(_('Event Recorded At'), default=now, blank=True)

    client_ip = models.CharField(_('Client IP'), max_length=15, null=True, blank=True)

    session_cookie_id = models.CharField(_('Session Cookie ID'), max_length=120, null=True, blank=True)

    user_cookie_id = models.CharField(_('User Cookie ID'), max_length=120, null=True, blank=True)

    user_id = models.CharField(_('User ID'), max_length=100, null=True, blank=True)

    request_url = models.TextField(_('Request URL'), default='')

    identifier = models.CharField(_('Identifier'), max_length=120, null=True, blank=True)

    filename = models.TextField(_('Filename'), null=True)

    size = models.PositiveBigIntegerField(_('Size'), null=True)

    user_agent = models.TextField(_('User Agent'), default='')

    # Alpha-2 ISO 3166 Codes: [Reference: Country Codes Alpha-2 & Alpha-3](https://www.iban.com/country-codes)
    country = models.CharField(_('Country'), max_length=10, default='gb')

    hit_type = models.CharField(_('Hit Type'), max_length=60, null=True, blank=True)

    is_robot = models.BooleanField(_('Is Robot'), default=False)

    is_machine = models.BooleanField(_('Is Machine'), default=False)
Micheal J. Roberts
  • 3,735
  • 4
  • 37
  • 76
  • you should just take the structure you have and map it into the structure you want in code. – bryan60 Oct 07 '20 at 15:24
  • @bryan60 I need the logs in this format for another validation step, unfortunately. – Micheal J. Roberts Oct 07 '20 at 15:35
  • so map it before that step? – bryan60 Oct 07 '20 at 15:36
  • They're logged one at a time. – Micheal J. Roberts Oct 07 '20 at 15:40
  • 2
    You’re not really being clear here on what you need Or what you mean. From what i see in your question, You’ve already done all of the heavy Querying at the database layer And got the aggregated information you want, just in a slightly different format. If that’s true, The solution here is clearly to just use your query set and modify it rather than trying to coax a SQL orm into giving you non tabular data. If not, you should clarify what exactly you’re not getting in the result that you need, beyond formatting. Fetching data is the province of the database, formatting it is for the view. – bryan60 Oct 07 '20 at 15:46
  • So although formatting is the province of the view, which I have done, the formatting requires further evaluations. With Django, querysets are evaluated in a lazy fashion so I don't want to be hiting a DB with more queryset requests that I need, so I was wondering if there was a way of doing the above in one queryset which some annotations, but I'm not sure where to start exactly for the structure I desire. – Micheal J. Roberts Oct 07 '20 at 16:03
  • django querysets are lazy, which means they don't evaluate till they have to, but also means after they evaluate once, they don't evaluate again. If all of the information you need is in the queryset, why would formatting it result in multiple evaluations? Have you actually tested this and do you have output for dupe queries? Again, it looks like you already have all of the information you need in one query, but the format is slightly off, so the solution is to just modify the format in the view. if you do not have all of the information you need, then please clarify what is missing. – bryan60 Oct 07 '20 at 16:12
  • @MichealJ.Roberts Gentile Info. When you put your code in SO, please do make it a ***simple and minimal one***. I can see many irrelevant fields in this context which polluting the ***"readability"*** – JPG Oct 09 '20 at 03:21
  • @MichealJ.Roberts It is not a ***"obscure format"*** it how a *SQL GROUP BY* return the results. – JPG Oct 09 '20 at 03:39
  • @ArakkalAbu Of course, it is a standard QuerySet format. I think I meant "obscure relative to what I want", which is, granted, selfish. So yes, it's not obscure. Thank you. I am just wondering if it is even *possible* to aggregate or annotate to something closer to the data structure above. I believe like George says in his answer below, it will need to be done on the serializer level. – Micheal J. Roberts Oct 09 '20 at 08:55

1 Answers1

0

I think you can only do this in python and not QuerySet, as Queryset is a database "SELECT" representation, and as far as i know you can not include dictionary "country-counts" in your queryset rather than build the result Either in a custom model Serializer Method or in python code:

Serializer Example:

    class LogItemSerializer(serializers.ModelSerializer):
        country_counts = serializers.SerializerMethodField()

        class Meta:
            model = LogItem

        def get_country_counts(self, obj):
            # create the dic you want
            result = {
                  "au": 6,
                  "jp": 1,
                  "us": 13
                  }
            return result