2

Is it possible to optimize Django to not run so many queries on something like this

for student in Student.objects.all():
    for course in student.course_set.all():
        for grade in course.grade_set.filter(student=student):
            # do stuff

The amount of queries is students * courses * grades which can get huge.

*edit One possibility after getting some ideas from roseman's blog.

for grade in student.grade_set.order_by('course', 'marking_period').select_related():
    if grade.marking_period_id in some_report_input:
        # do stuff

That's just a snippet but basically I replaced the for loops with just one for loops for the last item I care about (grades) Grades has references to everything I need (student, course, marking period). It was key to use things like marking_period_id instead of grade.marking_period (which does another query).

The trade off is code readability. I wanted to filter out grades and organize based on a criteria. This goes from trivial to convoluted.

This is by no means a generic solution. I'm sure there are times when this won't help at all. Please comment if you know a better way.

Another example:

for student in students:
  print student
  for department in departments:
    print department
    failed_grades = Grade.objects.filter(course__department=department,course__courseenrollment__user=student,grade__lte=70)
      for failed_grade in failed_grades:
        print grade.course
        print grade.grade

A student gets enrolled in a course. A course has a department.

Bufke
  • 3,195
  • 3
  • 28
  • 28

2 Answers2

1

It would be helpful if you post your models code and the "do stuff" code you ommit. This way, we could understand how to make an efficient query in your case.

Nevertheless, i think this could be helpful. It covers some cases that select_related doesn't. Note that prefetch_related is available since Django 1.4 so you may need to upgrade to this version in order to use it.

It's really important for us to help you that you add your "do stuff" code here, and if it is relevant (and i think it will be) add your models code here (just the fields declarations will be fine). Because the way of getting an optimized query depends on how your models are related and how you are using the queryset results.

EDIT:

in order to optimize the last "for" of your last example, you can do this:

failed_grades = Grade.objects.filter(course__department=department,course__courseenrollment__user=student,grade__lte=70).select_related('course')
for failed_grade in failed_grades:
    print grade.course
    print grade.grade

In this example, when you do grade.course, the select_related part of that query caches all the courses related to the filtered grades, so you can use them making just one query. So, if the __unicode__ method of Course model use only its own fields (i mean, if you don't show any other model data in Course's unicode method) you should get a better performance (less queries) that in your example. I'm not sure how to improve the other for statements as you want. But i think this can help you to get what you want (maybe i'm not understanding your models too much to help you better)

marianobianchi
  • 8,238
  • 1
  • 20
  • 24
  • Thanks for answering. I haven't seen any obvious way prefetch_related will help, but I need to investigate it more. You may view the full code [here](http://code.google.com/p/student-worker-relational-database/source/browse/ecwsp/sis/report.py) starting at line 237. I'll try to come up with some code that is more complete that the above, without the entire messy source file. – Bufke Jun 08 '12 at 16:06
  • I tried to see your code and i get scared... If you have a method with more than 200 lines line `pod_report_grade`, you have a bigger problem to attend before optimizing a query. Please read django documentation. Make the [tutorial](https://docs.djangoproject.com/en/1.4/intro/tutorial01/). Start all over again because that code is impossible to mantain... – marianobianchi Jun 09 '12 at 01:56
  • Yes, it's really, really bad. It also has thousands of users, real life sucks sometimes. I've provided a second example. – Bufke Jun 11 '12 at 22:39
-1

you can use select_related() and it will only hit the database once.

More info in this link (django's documentation) http://docs.djangoproject.com/en/1.2/ref/models/querysets/#select-related

An example of how you could use this in your case

for x in Student.objects.select_related():
    do stuff with x.course.grade`
EroSan
  • 339
  • 5
  • 13
  • As of Django 1.2 select_related works for reverse relations, but only with OneToOne relations... but I think you are right. This solution is not going to work this time... maybe doing `for grade in Grade.objects.select_related()` and then doing the filter? – EroSan Feb 28 '11 at 20:07
  • I can confirm looking at mysql "status;" select_related results in exactly the same number of queries. – Bufke Feb 28 '11 at 20:08
  • 1
    Yeah, Daniel got me wondering how to achieve this, though... and while researching how to optimize it i found this article (on his blog, of all places) http://blog.roseman.org.uk/2010/01/11/django-patterns-part-2-efficient-reverse-lookups/ and sounds like it could help... – EroSan Feb 28 '11 at 20:19