2

I have a simple database with about 3900 entries, and am using a generic view (django.views.generic.list_detail.object_list) with its django-pagination (through paginate_by) to browse the data in the database, but some queries are very slow.

The weird thing is that despite only showing 50 objects per page the rendering time scales roughly linearly with how many objects are selected (and I do not do any sorting of objects). E.g., if I do a query with ~3900, ~1800, ~900, ~54 selected objects it respectively takes ~8500 ms, ~4000 ms, ~2500 ms, ~800 ms of CPU time (using django-debug-toolbar) while the SQL only took ~50 ms, ~40 ms, ~35 ms, ~30 ms, again while all pages had exactly 50 objects. I have minimized the number of SQL queries using select_related as suggested in the django optimization page.

Using profiling middleware the vast majority of the time on long queries is spent doing db stuff:

         735924 function calls (702255 primitive calls) in 11.950 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
35546/3976    4.118    0.000    9.585    0.002 /usr/local/lib/python2.6/dist-packages/django/db/models/query.py:1120(get_cached_row)
    30174    3.589    0.000    3.991    0.000 /usr/local/lib/python2.6/dist-packages/django/db/models/base.py:250(__init__)

 ---- By file ----

      tottime
47.0%   3.669 /usr/local/lib/python2.6/dist-packages/django/db/models/base.py
 7.7%   0.601 /usr/local/lib/python2.6/dist-packages/django/db/models/options.py
 6.8%   0.531 /usr/local/lib/python2.6/dist-packages/django/db/models/query_utils.py
 6.6%   0.519 /usr/local/lib/python2.6/dist-packages/django/db/backends/sqlite3/base.py
 6.4%   0.496 /usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py
 5.0%   0.387 /usr/local/lib/python2.6/dist-packages/django/db/models/fields/__init__.py
 3.1%   0.244 /usr/local/lib/python2.6/dist-packages/django/db/backends/util.py
 2.9%   0.225 /usr/local/lib/python2.6/dist-packages/django/db/backends/__init__.py
 2.7%   0.213 /usr/local/lib/python2.6/dist-packages/django/db/models/query.py
 2.2%   0.171 /usr/local/lib/python2.6/dist-packages/django/dispatch/dispatcher.py
 1.7%   0.136 /usr/local/lib/python2.6/dist-packages/django/template/__init__.py
 1.7%   0.131 /usr/local/lib/python2.6/dist-packages/django/utils/datastructures.py
 1.1%   0.088 /usr/lib/python2.6/posixpath.py
 0.8%   0.066 /usr/local/lib/python2.6/dist-packages/django/db/utils.py
...
 ---- By group ---

      tottime
89.5%   6.988 /usr/local/lib/python2.6/dist-packages/django/db
 3.6%   0.279 /usr/local/lib/python2.6/dist-packages/django/utils
...

I can understand why the SQL query could scale with the number of selected entries. However, I don't see why the rest of the CPU time should be in anyway affected. This is very counterintuitive and I was wondering if there's any debugging/profiling tips someone could help me with.

Using django-1.2.3 with sqlite, python2.6, apache2-prefork (though switching to mpm-worker didn't significantly change things). Any tips/tricks would be greatly appreciated. Memory usage doesn't seem to be a factor (machine has 2Gb RAM and free says only using 300Mb in use (additionally 600Mb of cache)) either and the database is on the same server as the machine.

Found my mistake. I found my mistake. I checked the length of the original queryset to see if it was length 1 (and then went to object_detail if so). This resulted in evaluating the full queryset (which still only took 5ms according to django-debug-toolbar), but slowed everything down significantly.

Basically had something stupid like:

    if len(queryset) == 1:                                 
        return HttpResponseRedirect( fwd to object_detail url ...)
    return object_list(request, queryset=queryset, paginate_by=  ...)

which evaluated the full query; not the paginated query.

dr jimbob
  • 17,259
  • 7
  • 59
  • 81
  • 1
    Do you have indexes set up in your sqlite database? Otherwise sqlite might be doing linear scan for filtering or sorting. – Mike Axiak Nov 01 '10 at 18:54
  • I don't have any indexes set up at the moment; but I'm not sorting the results either and I'm under the impression that the django-debug-toolbar times the SQL queries. So if the SQL queries total 50 ms for selecting 1-50 of ~3900 objects and ~30 ms for selecting 1-50 of ~54 objects, that the difference in CPU time should only be 20 ms, not the current difference of 7800 ms. – dr jimbob Nov 01 '10 at 19:19

1 Answers1

3

When django does pagination it will use standard QuerySet slicing to get the results, this means it will use LIMIT and OFFSET.

You can view the SQL the ORM generates by calling str() on the .query attribute of the QuerySet:

    print MyModel.objects.all().query
    print MyModel.objects.all()[50:100].query

You can then ask sqlite to EXPLAIN the query and see what the database is trying to do. I'm guessing you are sorting on some field that does not have an index. EXPLAIN QUERY PLAN will tell you what indices would have been used, according to the sqlite documentation at http://www.sqlite.org/lang_explain.html

knutin
  • 5,033
  • 19
  • 26
  • This was helpful and led me along the track to really read through the SQL queries (explained in edited question above). – dr jimbob Nov 01 '10 at 19:45
  • 1
    If you want more information on when a query is evaluated, see http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated You could try using QuerySet...count() instead of len(). – knutin Nov 01 '10 at 19:52