0

I have to present a very complex page with a lot of data coming from 3 different tables related with ForeignKey and ManyToManyField... I was able to do what I want but performance is terrible and I'm stuck trying to find a better approach... here are detailed code:

Models:

class CATSegmentCollection(models.Model):
    theFile = models.ForeignKey('app_file.File', related_name='original_file')
    segmentsMT = models.ManyToManyField('app_mt.MachineTransTable', related_name='segMT', blank=True,)
    segmentsTM = models.ManyToManyField('app_tm.TMTable',           related_name='segTM', blank=True, through='app_cat.TM_Source_quality',)
    ...

class TM_Source_quality(models.Model):
    catSeg = models.ForeignKey('app_cat.CATSegmentCollection')
    tmSeg = models.ForeignKey('app_tm.TMTable')
    quality = models.IntegerField()

class MachineTransTable(models.Model):
    mt = models.ForeignKey('app_mt.MT_available', blank=True, null=True, )
    ...

class TMTable(models.Model):
    ...

From these models (I just wrote what is relevant to my problem) I present all the CATSegmentCollection entries related to a single file... together with its associated TM and MT segments. In other words each entry in CATSegmentCollection has zero or more TM segment from the TMTable table and zero or more MT segment from the MachineTransTable table.

This is what I do in the ListView (and I use AjaxListView because I'm using a infinite scrolling pagination from django-el-pagination):

class CatListView(LoginRequiredMixin, AjaxListView):
    Model = CATSegmentCollection
    template_name = 'app_cat/cat.html'
    page_template='app_cat/cat_page.html'

    def get_object(self, queryset=None):
        obj = File.objects.get(id=self.kwargs['file_id'])
        return obj

    def get_queryset(self):
        theFile = self.get_object()
        return CATSegmentCollection.objects.filter(theFile=theFile).prefetch_related('segmentsMT').prefetch_related('segmentsTM').order_by('segment_order')

    def get_context_data(self, **kwargs):
        context = super(CatListView, self).get_context_data(**kwargs)
        contextSegment = []
        myCatCollection = self.get_queryset()
        theFile = self.get_object()
        context['file'] = theFile
        for aSeg in myCatCollection:
            contextTarget = []
            if aSeg.segmentsTM.all():
                for aTargetTM in aSeg.tm_source_quality_set.all():
                    percent_quality = ...
                    contextTarget.append( {
                        "source"        : aTargetTM.tmSeg.source,
                        "target"        : aTargetTM.tmSeg.target,
                        "quality"       : str(percent_quality) + '%',
                        "origin"        : "TM",
                        "orig_name"     : aTargetTM.tmSeg.tm_client.name,
                        "table_id"      : aTargetTM.tmSeg.id,
                    })
            if aSeg.segmentsMT.all():
                for aTargetMT in aSeg.segmentsMT.all():
                    contextTarget.append( {
                        "target"    : aTargetMT.target,
                        "quality"   : "",
                        "origin"    : "MT",
                        "orig_name" : aTargetMT.mt.name,
                        "table_id"  : aTargetMT.id
                    })
            contextSegment.append( {
                "id"            : aSeg.id,
                "order"         : aSeg.segment_order,
                "source"        : aSeg.source,
                "target"        : contextTarget,
            })
        context['segments'] = contextSegment
        return context

Everything works but:

  • I hit the DB each time i call aSeg.segmentsTM.all() and aSeg.segmentsMT.all() because I guess the prefetch is not preventing it... this result in hundreds of duplicated queries
  • All these queries are repeated each time I load more entries from the paginations (in other words... each time more entries are presented because of scrolling, the full set of entries are requested... I tried also using lazy_paginate but nothing changes)
  • In principle all the logic I have in get_context_data (there is more but I just presented the essential code) could be reproduced in the template passing just the queryset... or by the client with a lot of jquery/javascript code but I don't think it's a good idea to proceed like this...

So my question is... I can optimize this code reducing the number of DB hits and the time to produce the response? Just to give you an idea a relative small size file (with 300 entries in the CATSegmentCollection) load in 6.5 sec with 330 queries (more than 300 duplicated) taking 0.4 sec. The DJDT time analysis gives

domainLookup    273 (+0)
connect         273 (+0)
request         275 (+-1475922263356)
response        9217 (+-1475922272298)
domLoading      9225 (+-1475922272306)

Any suggestions? Thanks

Attilio
  • 77
  • 1
  • 9
  • Is there any way you could fetch the data only once then process the links in python? Some complex queries are inmune to prefetching because of the logic you need to iterate over them. – Lorenzo Peña Oct 08 '16 at 21:38

2 Answers2

0

Optimizing number of queries is a quite tricky issue as pinpointing which exact code triggered that extra query is not obvious. So I would suggest to comment out all the code inside that for loop and start uncommenting it line by line while monitoring which exact line causes extra queries, and optimize it gradually.

A few observations:

  • You need to carefully declare all deep relations you touch inside prefetch_related, like:

    .prefetch_related('segmentsTM', 'segmentsTM__tm_source_quality_set', 'segmentsTM__tm_source_quality_set__tmSeg', 'segmentsTM__tm_source_quality_set__tmSeg__tm_client', 'segmentsMT', 'segmentsMT__mt')
    
  • No need to check if aSeg.segmentsMT.all(): before looping over it as it would still return an empty iterable.

  • Unrelated note regarding related_name='segMT' in your CATSegmentCollection model. related_name field is used to declare how the current model should be accessed from the other side of a relation, so you would probably want something like related_name='cATSegmentCollections' for both fields

At the end you should be able to optimize it down to somewhere around 10 queries (around one for each relation). The success criteria is not having any numerous WHERE foreign_id=X queries and have only WHERE foreign_id IN (X,Y,...) type of queries.

serg
  • 109,619
  • 77
  • 317
  • 330
  • Thanks... I'm working using your suggestions but I'm not sure what to do with the through table... prefetching 'segmentsMT' and 'segmentsMT__mt' solve the problem for the second for loop. For the first loop (which has a through table) the problem is different. The command 'for aTargetTM in aSeg.tm_source_quality_set.all():' trigger always several hundreds queries. I'm using 'segmentsTM', 'segmentsTM__tm_source_quality_set', 'segmentsTM__tm_source_quality_set__catSeg', 'segmentsTM__tm_source_quality_set__tmSeg' for prefetching but it's not enough... – Attilio Oct 09 '16 at 11:35
  • After many test I was able to get what I needed... the key point was to try to use the Prefetch command... I think it's worth to add an answer... I didn't find anywhere an example. – Attilio Oct 13 '16 at 18:36
  • @Attilio You can just post your solution as a separated answer and accept it. – serg Oct 14 '16 at 06:50
0

Following serg suggestions I started to dig into the problem and at the end I was able to prefetch all the needed info. I guess that using a thorough table change the way prefetching works... Here is the correct queryset:

all_cat_seg = CATSegmentCollection.objects.filter(theFile=theFile).order_by('segment_order')
all_tm_source_quality_entries = TM_Source_quality.objects.filter(catSeg__in=all_cat_seg).select_related('tmSeg','tmSeg__tm_client')
prefetch = Prefetch('tm_source_quality_set',queryset=all_tm_source_quality_entries)
CATSegmentCollection.objects.filter(theFile=theFile).prefetch_related(
    prefetch,
    'segmentsMT',
    'segmentsMT__mt'
).order_by('segment_order')

With this queryset, I was able to reduce the number of queries to 10...

Attilio
  • 77
  • 1
  • 9