3x3 Matrix determinant function - making it faster

Question

I'm writing a bigger program and getting the determinants of 3x3 matrices as fast as possible is pretty important for it to work well. I've read that I could use numPy to do it, but I thought that maybe writing my own code would be more educational as I'm in my 3rd semester of CompSci.

So I wrote two functions and I'm using time.clock() (I'm on a win7 machine) to time how long it takes for each function to return a value.

This is the first function:

def dete(a):
   x = (a[0][0] * a[1][1] * a[2][2]) + (a[1][0] * a[2][1] * a[3][2]) + (a[2][0] * a[3][1] * a[4][2])
   y = (a[0][2] * a[1][1] * a[2][0]) + (a[1][2] * a[2][1] * a[3][0]) + (a[2][2] * a[3][1] * a[4][0])
   return x - y

And this is the second function:

def det(a):
    a.append(a[0]); a.append(a[1]);
    x = 0
    for i in range(0, len(a)-2):
        y=1;        
        for j in range(0, len(a)-2):    
            y *= a[i+j][j]      
        x += y

    p = 0
    for i in range(0, len(a)-2):
        y=1;
        z = 0;
        for j in range(2, -1, -1):  
            y *= a[i+z][j]  
            z+=1        
        z += 1
        p += y  
    return x - p

They both give the correct answers, however the first one seems to be slightly faster, which makes me think that since for-loops are more elegant to use and usually faster, I'm doing something wrong - I made the loops too slow and fat. I tried trimming it down, but it seems that the *= and += operations are taking too much time, there's too many of them. I haven't checked how fast numPy takes care of this problem yet, but I wanna get better at writing efficient code. Any ideas on how to make those loops faster?

Seems to be slightly faster? Please use `timeit` and a profiler to show **exactly** how much faster. — S.Lott, Feb 26 '12 at 17:28
So for-loops are usually faster than a simple unrolled direct computation? Hmm... seems I need to really learn many things about Python ;) — Christian Rau, Feb 26 '12 at 17:32
If you want to compute a 3x3-determinant there's nothing more elegant than this optimized formula from your first function. — Christian Rau, Feb 26 '12 at 17:37

Nobody moving away from SE · Answer 1 · 2012-06-05T09:47:23.590

3

First let me note, that micro scale speed optimization should take place in another language. So you are better off using a library that employs c-written functionality.

About your for loops: It is a common technique to unroll (small) loops for speedup, so it is not always faster to let loops do the job. Usually it is just more generic (and most generic algorithms are in fact slower than specialized ones).

As stated in the comments it will not increase speed in python, when replacing - with * but it might increase speed if there are less arithmetic operations involved. Hence I will post the factored out term here:

def dete(a):
    return (a[0][0] * (a[1][1] * a[2][2] - a[2][1] * a[1][2])
           -a[1][0] * (a[0][1] * a[2][2] - a[2][1] * a[0][2])
           +a[2][0] * (a[0][1] * a[1][2] - a[1][1] * a[0][2]))

As you can see there are 5 +/- and 9 * where as in the original version there are 5 +/- and 12 *. Also note that this version accesses a only 15 times were the original one accessed it 18 times.

Summarized this yields 3 arithmetic operations and 3 variable accesses less than the fully multiplied version.

edited Jun 05 '12 at 09:47

answered Feb 26 '12 at 17:29

Nobody moving away from SE

9,347
3
45
92

1

Changing `*` for a series of sums in Python code would be slower - the reason Python is slow for arythmetic manipulation is that operations involve introspection and method calling - for each operation. That takes hundreds, times more cycles than the dfifference between a CPU bound multiplication and an addition. – jsbueno Feb 26 '12 at 17:39
Ok I removed that, it was more of a note from my C++ background :) – Nobody moving away from SE Feb 26 '12 at 17:43
@jsbueno Changing * for a series of sums on any modern architecture would be slower. There's a reason ALUs have a native multiplication operations, and there's a reason JIT compilers use it. – David Souther Feb 26 '12 at 17:48
On basic cpu datatypes as floats or ints this may hold true, but most matrix operations take their advantage in higher level datatypes that still are much slower in `*` than in `-`. But the OP probably has only matrices of those "simple" types. – Nobody moving away from SE Feb 26 '12 at 17:51
For the record, factoring out common terms as Nobody originally suggested *does* reduce the total number of operations required. You can get it down to 14 arithmetic ops compared with 17, so it's not merely a multiplication-for-subtraction swap, and it's a win even if we stipulate that * costs as much as -. – DSM Feb 26 '12 at 18:01
Okay, thank you for the really quick answer. I'll try and go along with this, if I do need more speed, I'll rewrite the program in Java (Java is the language of my course, I'm learning Python on my own). – Protagonist Feb 26 '12 at 18:23

jsbueno · Accepted Answer · 2012-02-26T17:42:00.617

Loops are - more elegant and more generic, but they are not "usually faster" than a couple of inline multiplications in a single expression.

For one, a forloop in python has to assemble the object over whcih uyou will interate (the call to range), and then call a method on that iterator for every item on the loop.

So, depending on what you are doing, if the inline form is speedy enough for you keep it - if it is still too slow (as usually is the case when we are doing numeric computation in Python), you should use a numeric library (fore example NumpY), that can compute determinants in native code. In the case of numeric manipulation code like this, you can it running hundreds of times faster using native code.

If yo9u need some numeric calculation that can't be performed by an already made library, if you seek speed (for example, for pixel manipulation in image processing), you may prefer to write an extension that runs in native code (using either C, Cython, or some other thing) in order to have it fast.

On the other hand, if speed is not crucial, and you even noted the inlined expression is just "slightly faster", just use the full loop - you get more readable and maintainable code - which are the main reasons for using Python after all.

On the specific example you gave, you can get some speed increase in the loop code by hardcoding the "range" calls to tuples - for example, changing: for i in range(0, len(a)-2): to for i in (0, 1, 2) - note that as in the inline case, you loose the ability to work with matrices of different sizes.

Thank you so much for the pretty exhaustive answer, I'll definitely try all those options you mentioned and if I do need the code to run faster, I'll redo it in Java. I'm really enjoying doing things in Python, it's so pleasant. — Protagonist, Feb 26 '12 at 18:24

score 1 · Answer 3 · answered Feb 26 '12 at 17:31

1

While you are unrolling it, as proposed above, you could also combine the two blocks into one, as a quick glance found no dependencies between the two blocks (correct me if I'm wrong)

answered Feb 26 '12 at 17:31

malexmave

1,283
2
17
37

score 1 · Answer 4 · answered Feb 26 '12 at 17:32

1

It is almost impossible for loops to be faster than explicit, long expressions, so no wonder that the first variant is faster. I doubt that you could come up with smt faster than first function.

answered Feb 26 '12 at 17:32

Roman Bodnarchuk

29,461
12
59
75

Jörg Beyer · Answer 5 · 2012-02-26T17:35:25.600

you can unroll the loops and take advantage of the fact that you handle 3x3 matrices and not nxn matrices.

With this optimization you get rid of the determination of the size of the matrix. You trade flexibility with a little speed up. You can simply write down the concrete formulas for each cell of the result matrix. BTW: (c++) Compilers do such optimization stuff.

I would only suggest to do so, if you are really sure that such a small optimization is worth the specialized code. To make sure, that you optimize the right part of your code use e.g. the profiling tools see http://docs.python.org/library/profile.html or timeit.

3x3 Matrix determinant function - making it faster

5 Answers5