2

I have a database table in psql which contains of 10,000,000 rows and 60 columns (features). I define a Django Queryset as follows:

MyQ=MyDataBase.objects.filter(Name='Mike', date=date(2018, 2, 11), 
Class='03')

There are only 5 rows that satisfy the above filter. But when I try something like

MyQ.count() #which equals 5

or

MyQ.aggregate(Sum('Score'))['Score__sum'] #which equals 61

each take about 3 minutes to give me the result. Isn't that weird? Aren't query sets supposed to make life easier by focusing only on the rows that we have told them to focus on? counting 5 rows or summing one of the fields of them must not take that long. What am I doing wrong?

I should also say this. The first time that I tried this code on this table, everything was fine and it took maybe 1 second to catch the result but now the 3 minutes is really annoying. And since then I have not changed anything in the database or the code.

Mike
  • 369
  • 5
  • 21
  • Sixty columns is a lot, but by no means something that your database shouldn't be able to reasonably handle. What indexes exist on the table you're querying? I imagine if the query is taking 3 minutes, the query planner is having to due a full table scan on the 10,000,000 rows. – Ben Siver Feb 14 '19 at 21:52
  • I have problem understanding this index thing also. Thank you for bringing it up. How can I answer this question: "What indexes exist on the table you're querying?" – Mike Feb 14 '19 at 21:56
  • 2
    Generally if you are filtering your table based on a certain field or number of fields, you should create an index on those fields. It allows the database query planner to take a more optimized path when searching/sorting. It looks like you're using Postgres from your question, so you can run `SELECT * FROM pg_indexes WHERE tablename = 'yourtable';` in psql to see any existing indexes. Django can create these indexes for you in your model definition. Here's some more reading specific to Django: https://www.gun.io/blog/learn-indexing-dammit – Ben Siver Feb 14 '19 at 22:02
  • @BenSiver It worked!!! And the speed is now marvelously high. Its strange how such a small modification can be so effective. Thanks a lot – Mike Feb 15 '19 at 15:42
  • Awesome, glad it worked for you! I updated my comment as an answer here in case others see a similar issue. – Ben Siver Feb 15 '19 at 15:47

2 Answers2

3

Generally if you are filtering your table based on a certain field or number of fields, you should create an index on those fields. It allows the database query planner to take a more optimized path when searching/sorting.

It looks like you're using Postgres from your question, so you can run SELECT * FROM pg_indexes WHERE tablename = 'yourtable'; in psql to see any existing indexes.

Django can create these indexes for you in your model definition. For example, your model MyDatabase might look something like this:

class MyDatabase(models.Model):
    name = models.TextField(index=True)
    date = models.DateField(index=True)
   class = models.TextField(index=True)

Here's some more reading specific to creating indexes on Django models: gun.io/blog/learn-indexing-dammit

Ben Siver
  • 2,758
  • 1
  • 25
  • 42
  • Thanks. I used: class Meta(object): indexes = [ models.Index(fields=['Name', 'date', 'Class']), ] – Mike Feb 15 '19 at 18:19
0

There should be well index database table which is filtering.

If you aggerate multiple time with group_by then use CTE. The django-cte packge supporting it.

If you have historical data and performing multiple time aggregation then use metalized view in postgres or respective db alternative. You can refresh the data periodically in by scheduling the job. in the mat view you can also create index.