3

I am trying to speed up my code which currently takes a little over an hour to run in Python / Numpy. The majority of computation time occurs in the function pasted below.

I'm trying to vectorize Z, but I'm finding it rather difficult for a triple for loop. Could I possible implement the numpy.diff function somewhere? Take a look:

def MyFESolver(KK,D,r,Z):
    global tdim
    global xdim
    global q1
    global q2
    for k in range(1,tdim):
        for i in range(1,xdim-1):
            for j in range (1,xdim-1):
                Z[k,i,j]=Z[k-1,i,j]+r*q1*Z[k-1,i,j]*(KK-Z[k-1,i,j])+D*q2*(Z[k-1,i-1,j]-4*Z[k-1,i,j]+Z[k-1,i+1,j]+Z[k-1,i,j-1]+Z[k-1,i,j+1])
    return Z

tdim = 75 xdim = 25

Zack
  • 713
  • 2
  • 8
  • 21
  • Try the line `Z[1:, 1:-1, 1:-1] = Z[:-1, 1:-1, 1:-1] + r*q1*Z[:-1, 1:-1, 1:-1]*(KK-Z[:-1, 1:-1, 1:-1]) + D*q2*(Z[:-1,:-2,1:-1] - 4*Z[:-1, 1:-1, 1:-1] + Z[:-1, 2:, 1:-1] + Z[:-1, 1:-1, :-2] + Z[:-1, 1:-1, 2:])` instead of your triple for. – halex Oct 25 '12 at 10:34
  • 1
    unrelated, but important: you probably want to recheck the use of the `global` keyword. It is useless in your case. – Simon Bergot Oct 25 '12 at 12:43

2 Answers2

4

I agree, it's tricky because the BCs on all four sides, ruin the simple structure of the Stiffness matrix. You can get rid of the space loops as such:

from pylab import *
from scipy.sparse.lil import lil_matrix
tdim = 3;     xdim = 4;  r = 1.0;  q1, q2 = .05, .05; KK= 1.0; D = .5  #random values
Z = ones((tdim, xdim, xdim))
#Iterate in time
for k in range(1,tdim):
    Z_prev = Z[k-1,:,:] #may need to flatten
    Z_up = Z_prev[1:-1,2:]
    Z_down = Z_prev[1:-1,:-2]

    Z_left = Z_prev[:-2,1:-1]
    Z_right = Z_prev[2:,1:-1]

    centre_term  = (q1*r*(Z_prev[1:-1,1:-1] + KK) - 4*D*q2)* Z_prev[1:-1,1:-1] 

    Z[k,1:-1,1:-1]= Z_prev[1:-1,1:-1]+ centre_term + q2*(Z_up+Z_left+Z_right+Z_down)

But I don't think you can get rid of the time loop...

I think the expression:

Z_up = Z_prev[1:-1,2:]

makes a copy in numpy, whereas what you want is a view - if you can figure out how to do this - it should be even faster (how much?)

Finally, I agree with the rest of the answerers - from experience, this kind of loops are better done in C and then wrapped into numpy. But the above should be faster than the original...

alexandre iolov
  • 582
  • 3
  • 11
  • 2
    Yes right, everything but the time loop is OK to vectorize. Numpy always creates views for slicing operations though. – seberg Oct 25 '12 at 14:22
2

This looks like an ideal case for Cython. I'd suggest writing that function in Cython, it'll probably be hundreds of times faster.

tiago
  • 22,602
  • 12
  • 72
  • 88