0

I'm working on a project in which around 3 million records are saved in a MySQL database. The model is like:

class Record(models.Model):
    rec_type = models.CharField(...)
    rec_value = models.FloatField(...)
    rec_prop1 = models.CharField(...)
    ...other fields...

    class Meta:
        ordering = ['rec_value']

A typical query contains a range of target rec_value, a specific rec_type, and sometimes, a specific rec_prop1. The query action is much more frequently used than the adding record action.

My query function is written like this:

def find_target(value_from,value_to,type=None,prop1=None):
    search_set = Record.objects.all()
    if type: #not None
        search_set = search_set.filter(rec_type=type)
        if search_set.count == 0:
            return []
    if prop1: #not None
        search_set = search_set.filter(rec_prop1=type)
        if search_set.count == 0:
            return []

    search_list = search_list.filter(rec_value__gte=value_from,rec_value__lte=value_to)
    result_list = []

    for rec in search_list.values(...): #only get useful fields
        result_list.append(some_runtime_calculation_about_rec)

    return result_list

The code works fine but takes around 7s for each query. No indexing is used currently. I want to improve query performance. I have searched the Internet for solutions and learned to use QuerySet.values() and database indexing. The problem is that rec_type field only has 3 possible values(eg. A, B, C), and the most of the records (around 70%) belong to one of them(eg. A). The rec_value field is filtered in every query so I made it ordered in the class Meta in the model. The rec_prop1 has around 10 possible values but in most of queries, it remains to be None, which means no filtering. My question is, should I index the rec_typerec_value or rec_prop1 ?Given the rec_value is already ordered and rec_type has imbalance distribution and a few possible values and rec_prop1 usually not filtered? Is there any way to further improve the performance of the query function in my case? I'm still learning about Django and Database. I really appreciate your suggestions and helps. Thanks a lot.

Hansen Zhao
  • 27
  • 2
  • 7

1 Answers1

0

I'm recently continuing to optimize the code. Firstly, I found that the code

search_set.count == 0

can be optimized as:

search_set.exists()

which significantly improved the performance. (Search time from 7s to 2.3s). Secondly, indexing seems to have little effects on performance.

Hansen Zhao
  • 27
  • 2
  • 7