4

I have a Django model with score and quizzes_done. I want to calculate percentiles on both scores and quizzes_done. To do that, I need to create two sorted lists. For the first one, I do:

s_recs = MyRecord.objects.all().order_by('score')
score_list = [r.score for r in s_recs]

I can do the same to get a sorted quiz_list or I can use python sort. I am wondering which one is faster.

Use the s_recs we got above and do

quizzes_list = [r.quizzes_done for r in s_recs]
quizzes_list.sort()

OR

q_recs = MyRecord.objects.all().order_by('quizzes_done')
quizzes_list = [r.quizzes_done for r in q_recs]
zaphod
  • 2,045
  • 1
  • 14
  • 18
  • 1
    try it and find out ... it probably depends on several factors ... I would guess for larger lists it will be faster to have sqlite or whatever ORDER BY – Joran Beasley Jul 22 '13 at 23:50
  • 1
    I bet the SQL solution is faster, and at the same you can offload the processing load to database too. – Hieu Nguyen Jul 22 '13 at 23:57
  • 5
    "It depends" is right. Python may well be faster for smaller datasets, but you generally want your database to do the ordering, as it "knows" what the data looks like, and may well already have an ordered (btree) index on the column you want to sort by, which will mean sorting results should run in `O(log N)` time rather than `O(N log N)` time. Another good question is, "does it matter?" – Ben Hoyt Jul 23 '13 at 00:14
  • 2
    Yes somehow it sounds like premature optimization to me – Hieu Nguyen Jul 23 '13 at 00:53
  • 1
    I agree with Ben's comment. Also, if you need optimisation on retrieving thousands of objects, it might be worth to use a custom SQL with less information in the select. It depends what you want to achieve and how many scores you are talking about. – François Constant Jul 23 '13 at 02:10
  • 1
    Thank you for the comments. I tried to test, and you folks are correct - for my dataset size, I do not see any significant difference. I will probably use the DB option, and hopefully, will reach a stage someday when I will need to optimize this. – zaphod Jul 24 '13 at 22:30

1 Answers1

2

Here is a little function that you can pass them both through to time:

def timedcall(fn, *args):
    "Call function with args; return the time in seconds and result."
    t0 = time.clock()
    result = fn(*args)
    t1 = time.clock()
    return t1-t0, result

This returns the timing and any result returned by the function you call. I think you will find that unless you are sorting upwards of 10s of thousands of records there will be little difference, but I would be interested to hear the results.

ChrisProsser
  • 12,598
  • 6
  • 35
  • 44